Re: Race condition in TransactionIdIsInProgress
On 25/06/2022 13:10, Simon Riggs wrote: On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas wrote: On 24/06/2022 04:43, Andres Freund wrote: On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: In summary, I think we should: - commit and backpatch Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch - fix pg_xact_status() to check TransactionIdIsInProgress() - in v15, remove TransationIdIsKnownCompleted function altogether I'll try to get around to that in the next few days, unless someone beats me to it. Makes sense. This is what I came up with for master. One difference from Simon's original patch is that I decided to not expose the TransactionIdIsKnownNotInProgress(), as there are no other callers of it in core, and it doesn't seem useful to extensions. I inlined it into the caller instead. Looks good, thanks. Committed and backpatched. Thanks! - Heikki
Re: Race condition in TransactionIdIsInProgress
On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas wrote: > > On 24/06/2022 04:43, Andres Freund wrote: > > On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: > >> In summary, I think we should: > >> - commit and backpatch Simon's > >> just_remove_TransactionIdIsKnownCompleted_call.v1.patch > >> - fix pg_xact_status() to check TransactionIdIsInProgress() > >> - in v15, remove TransationIdIsKnownCompleted function altogether > >> > >> I'll try to get around to that in the next few days, unless someone beats > >> me > >> to it. > > > > Makes sense. > > This is what I came up with for master. One difference from Simon's > original patch is that I decided to not expose the > TransactionIdIsKnownNotInProgress(), as there are no other callers of it > in core, and it doesn't seem useful to extensions. I inlined it into the > caller instead. Looks good, thanks. > BTW, should we worry about XID wraparound with the cache? Could you have > a backend sit idle for 2^32 transactions, without making any > TransactionIdIsKnownNotInProgress() calls? That's not new with this > patch, though, it could happen with the single-item cache in transam.c, too. Yes, as a separate patch only in PG15+, imho. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Race condition in TransactionIdIsInProgress
On 24/06/2022 04:43, Andres Freund wrote: On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: In summary, I think we should: - commit and backpatch Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch - fix pg_xact_status() to check TransactionIdIsInProgress() - in v15, remove TransationIdIsKnownCompleted function altogether I'll try to get around to that in the next few days, unless someone beats me to it. Makes sense. This is what I came up with for master. One difference from Simon's original patch is that I decided to not expose the TransactionIdIsKnownNotInProgress(), as there are no other callers of it in core, and it doesn't seem useful to extensions. I inlined it into the caller instead. BTW, should we worry about XID wraparound with the cache? Could you have a backend sit idle for 2^32 transactions, without making any TransactionIdIsKnownNotInProgress() calls? That's not new with this patch, though, it could happen with the single-item cache in transam.c, too. - HeikkiFrom de81639af2ce0ed635778d4f7cafb3dfc4fd84f5 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Sat, 25 Jun 2022 12:02:39 +0300 Subject: [PATCH 1/1] Fix visibility check when XID is committed in CLOG but not in procarray. TransactionIdIsInProgress had a fast path to return 'false' if the single-item CLOG cache said that the transaction was known to be committed. However, that was wrong, because a transaction is first marked as committed in the CLOG but doesn't become visible to others until it has removed its XID from the proc array. That could lead to an error: ERROR: t_xmin is uncommitted in tuple to be updated or for an UPDATE to go ahead without blocking, before the previous UPDATE on the same row was made visible. The window is usually very short, but synchronous replication makes it much wider, because the wait for synchronous replica happens in that window. Another thing that makes it hard to hit is that it's hard to get such a commit-in-progress transaction into the single item CLOG cache. Normally, if you call TransactionIdIsInProgress on such a transaction, it determines that the XID is in progress without checking the CLOG and without populating the cache. One way to prime the cache is to explicitly call pg_xact_status() on the XID. Another way is use a lot of subtransactions, so that the subxid cache in the proc array is overflown, making TransactionIdIsInProgress rely on CLOG checks. This has been broken ever since it was introduced in 2008, but the race condition is very hard to hit, especially without synchronous replication. There were a couple of reports of this starting from summer 2021, but no one was able to find the root cause then. TransactionIdIsKnownCompleted() is now unused. In 'master', remove it, but I left in place in backbranches in case it's used by extensions. Also change pg_xact_status() to check TransactionIdIsInProgress(). Previously, it only checked the CLOG, and returned "committed" before the transaction was actually made visible to other queries. Note that this also means that you cannot use pg_xact_status() to reproduce the bug anymore, even if the code wasn't fixed. Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with the pg_xact_status() change added by me. Author: Simon Riggs Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru --- src/backend/access/transam/transam.c | 27 - src/backend/storage/ipc/procarray.c | 12 ++- src/backend/utils/adt/xid8funcs.c| 30 +++- src/include/access/transam.h | 1 - 4 files changed, 23 insertions(+), 47 deletions(-) diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c index dbc5f884e8..5865810135 100644 --- a/src/backend/access/transam/transam.c +++ b/src/backend/access/transam/transam.c @@ -219,33 +219,6 @@ TransactionIdDidAbort(TransactionId transactionId) return false; } -/* - * TransactionIdIsKnownCompleted - * True iff transaction associated with the identifier is currently - * known to have either committed or aborted. - * - * This does NOT look into pg_xact but merely probes our local cache - * (and so it's not named TransactionIdDidComplete, which would be the - * appropriate name for a function that worked that way). The intended - * use is just to short-circuit TransactionIdIsInProgress calls when doing - * repeated heapam_visibility.c checks for the same XID. If this isn't - * extremely fast then it will be counterproductive. - * - * Note: - * Assumes transaction identifier is valid. - */ -bool -TransactionIdIsKnownCompleted(TransactionId transactionId) -{ - if (TransactionIdEquals(transactionId, cachedFetchXid)) - { - /* If it's in the cache at all, it must be completed. */ - return true; - } - - return false; -} - /* * TransactionIdCommitTree * Marks the given transaction
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote: > On 12/02/2022 05:42, Andres Freund wrote: > > The only reason I'm so far not succeeding in turning it into an > > isolationtester spec is that a transaction waiting for SyncRep doesn't count > > as waiting for isolationtester. > > > > Basically > > > > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; > > S2: SELECT pg_xact_status($xid); > > S2: UPDATE; > > > > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid > > cache, which then causes the TransactionIdIsInProgress() to take the early > > return path, despite the transaction still being in progress. Which then > > allows the update to proceed, despite the S1 not having "properly committed" > > yet. > > I started to improve isolationtester, to allow the spec file to specify a > custom query to check for whether the backend is blocked. But then I > realized that it's not quite enough: there is currently no way write the > "$xid = txid_current()" step either. Isolationtester doesn't have variables > like that. Also, you need to set synchronous_standby_names to make it block, > which seems a bit weird in an isolation test. I don't think we strictly need $xid - it likely can be implemented via something like SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE application_name = 'testname/s1'; > I wish we had an explicit way to inject delays or failures. We discussed it > before > (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), > but I don't want to pick up that task right now. Understandable :( > Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if > this is worth committing, the pg_xact_status() trick is quite special for > this particular bug. Yea, not sure either. > Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good > to me. Replying to the discussion on that: > > On 12/02/2022 23:50, Andres Freund wrote: > > On 2022-02-12 13:25:58 +, Simon Riggs wrote: > > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted() > > > in backbranches. > > > > I think it'd be fine if we needed to. Or we could just make it always return > > false or such. > > > > > > > > > just removes the known offending call, passes make check, but IMHO > > > > > leaves the same error just as likely by other callers. > > > > > > > > Which other callers are you referring to? > > > > > > I don't know of any, but we can't just remove a function in a > > > backbranch, can we? > > > > We have in the past, if it's a "sufficiently internal" function. > > I think we should remove it in v15, and leave it as it is in backbranches. > Just add a comment to point out that the name is a bit misleading, because > it checks the clog rather than the proc array. It's not inherently > dangerous, and someone might have a legit use case for it. True, someone > might also be doing this incorrect thing with it, but still seems like the > right balance to me. > > I think we also need to change pg_xact_status(), to also call > TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). > Currently, if it returns "committed", and you start a new transaction, the > new transaction might not yet see the effects of that "committed" > transaction. With that, you cannot reproduce the original bug with > pg_xact_status() though, so there's no point in committing that test then. > > In summary, I think we should: > - commit and backpatch Simon's > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > - fix pg_xact_status() to check TransactionIdIsInProgress() > - in v15, remove TransationIdIsKnownCompleted function altogether > > I'll try to get around to that in the next few days, unless someone beats me > to it. Makes sense. Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
On 12/02/2022 05:42, Andres Freund wrote: On 2022-02-11 16:41:24 -0800, Andres Freund wrote: FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench r/w workload that's been modified to start 70 savepoints at the start shows pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches" pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches" pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches" after a few minutes of running with a local, not slowed down, syncrep. Without any other artifical slowdowns or such. And this can easily be triggered even without subtransactions, in a completely reliable way. The only reason I'm so far not succeeding in turning it into an isolationtester spec is that a transaction waiting for SyncRep doesn't count as waiting for isolationtester. Basically S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; S2: SELECT pg_xact_status($xid); S2: UPDATE; suffices, because the pg_xact_status() causes an xlog fetch, priming the xid cache, which then causes the TransactionIdIsInProgress() to take the early return path, despite the transaction still being in progress. Which then allows the update to proceed, despite the S1 not having "properly committed" yet. I started to improve isolationtester, to allow the spec file to specify a custom query to check for whether the backend is blocked. But then I realized that it's not quite enough: there is currently no way write the "$xid = txid_current()" step either. Isolationtester doesn't have variables like that. Also, you need to set synchronous_standby_names to make it block, which seems a bit weird in an isolation test. I wish we had an explicit way to inject delays or failures. We discussed it before (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), but I don't want to pick up that task right now. Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if this is worth committing, the pg_xact_status() trick is quite special for this particular bug. Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good to me. Replying to the discussion on that: On 12/02/2022 23:50, Andres Freund wrote: On 2022-02-12 13:25:58 +, Simon Riggs wrote: I'm not sure it is possible to remove TransactionIdIsKnownCompleted() in backbranches. I think it'd be fine if we needed to. Or we could just make it always return false or such. just removes the known offending call, passes make check, but IMHO leaves the same error just as likely by other callers. Which other callers are you referring to? I don't know of any, but we can't just remove a function in a backbranch, can we? We have in the past, if it's a "sufficiently internal" function. I think we should remove it in v15, and leave it as it is in backbranches. Just add a comment to point out that the name is a bit misleading, because it checks the clog rather than the proc array. It's not inherently dangerous, and someone might have a legit use case for it. True, someone might also be doing this incorrect thing with it, but still seems like the right balance to me. I think we also need to change pg_xact_status(), to also call TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). Currently, if it returns "committed", and you start a new transaction, the new transaction might not yet see the effects of that "committed" transaction. With that, you cannot reproduce the original bug with pg_xact_status() though, so there's no point in committing that test then. In summary, I think we should: - commit and backpatch Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch - fix pg_xact_status() to check TransactionIdIsInProgress() - in v15, remove TransationIdIsKnownCompleted function altogether I'll try to get around to that in the next few days, unless someone beats me to it. - Heikkidiff --git a/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl new file mode 100644 index 00..8a651c0c60 --- /dev/null +++ b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl @@ -0,0 +1,86 @@ +# Test for visibility checks, when an updating transaction has already +# written commit record but has not yet removed its entry from the +# proc array, and thus isn't yet visible to other transactions. +# +# In particular, this reproduced an old bug in that, see: +# https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru + +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use T
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-12 13:25:58 +, Simon Riggs wrote: > On Fri, 11 Feb 2022 at 19:08, Andres Freund wrote: > > > On 2022-02-11 13:48:59 +, Simon Riggs wrote: > > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs > > > wrote: > > > > > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > > > > > > > Looks lik syncrep will make this a lot worse, because it can > > > > > drastically > > > > > increase the window between the TransactionIdCommitTree() and > > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. > > > > > But at > > > > > least it might make it easier to write tests exercising this > > > > > scenario... > > > > > > > > Agreed > > > > > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > > > item cache is set too early in some cases. The single item cache is > > > > important for performance, so we just need to be more careful about > > > > setting the cache. > > > > > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > > > being set, but this fails! Not worked out why, yet. > > > > I don't think it's safe to check !TransactionIdKnownNotInProgress() in > > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to > > do TransactionLogFetch() internally. > > That's not correct because you're confusing > TransactionIdKnownNotInProgress(), which is a new function introduced > by the patch, with the existing function > TransactionIdIsKnownCompleted(). I don't think so. This call of the new TransactionIdKnownNotInProgress() > --- a/src/backend/access/transam/transam.c > +++ b/src/backend/access/transam/transam.c > @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId) > return TRANSACTION_STATUS_ABORTED; > } > > + /* > + * Safeguard that we have called TransactionIsIdInProgress() before > + * checking commit log manager, to ensure that we do not cache the > + * result until the xid is no longer in the procarray at eoxact. > + */ > + if (!TransactionIdKnownNotInProgress(transactionId)) > + return TRANSACTION_STATUS_IN_PROGRESS; > + > /* >* Get the transaction status. >*/ isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the (sub)transaction aborted. Which explains why your first patch doesn't work. > > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > > > > I think this might contain a different diff than what the name suggests? > > Not at all, please don't be confused by the name. The patch removes > the call to TransactionIdIsKnownCompleted() from > TransactionIdIsInProgress(). I guess for me "just remove" doesn't include adding a new cache... > I'm not sure it is possible to remove TransactionIdIsKnownCompleted() > in backbranches. I think it'd be fine if we needed to. Or we could just make it always return false or such. > > > just removes the known offending call, passes make check, but IMHO > > > leaves the same error just as likely by other callers. > > > > Which other callers are you referring to? > > I don't know of any, but we can't just remove a function in a > backbranch, can we? We have in the past, if it's a "sufficiently internal" function. Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
On Fri, 11 Feb 2022 at 19:08, Andres Freund wrote: > On 2022-02-11 13:48:59 +, Simon Riggs wrote: > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs > > wrote: > > > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > > increase the window between the TransactionIdCommitTree() and > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. > > > > But at > > > > least it might make it easier to write tests exercising this scenario... > > > > > > Agreed > > > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > > item cache is set too early in some cases. The single item cache is > > > important for performance, so we just need to be more careful about > > > setting the cache. > > > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > > being set, but this fails! Not worked out why, yet. > > I don't think it's safe to check !TransactionIdKnownNotInProgress() in > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to > do TransactionLogFetch() internally. That's not correct because you're confusing TransactionIdKnownNotInProgress(), which is a new function introduced by the patch, with the existing function TransactionIdIsKnownCompleted(). > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch > > I think this might contain a different diff than what the name suggests? Not at all, please don't be confused by the name. The patch removes the call to TransactionIdIsKnownCompleted() from TransactionIdIsInProgress(). I'm not sure it is possible to remove TransactionIdIsKnownCompleted() in backbranches. > > just removes the known offending call, passes make check, but IMHO > > leaves the same error just as likely by other callers. > > Which other callers are you referring to? I don't know of any, but we can't just remove a function in a backbranch, can we? > To me it seems that the "real" reason behind this specific issue being > nontrivial to fix and behind the frequent error of calling > TransactionIdDidCommit() before TransactionIdIsInProgress() is > that we've really made a mess out of the transaction status determination code > / API. IMO the original sin is requiring the complicated > > if (TransactionIdIsCurrentTransactionId(xid) > ... > else if (TransactionIdIsInProgress(xid) > ... > else if (TransactionIdDidCommit(xid) > ... > > dance at pretty much any place checking transaction status. Agreed, it's pretty weird to have to call the functions in the right order or you get the wrong answer. Especially since we have no cross-check to ensure the correct sequence was followed. > The multiple calls > for the same xid is, I think, what pushed the caching down to the > TransactionLogFetch level. > > ISTM that we should introduce something like TransactionIdStatus(xid) that > returns > XACT_STATUS_CURRENT > XACT_STATUS_IN_PROGRESS > XACT_STATUS_COMMITTED > XACT_STATUS_ABORTED > accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks > XidInMVCCSnapshot() instead of TransactionIdIsInProgress(). > > I think that'd also make visibility checks faster - we spend a good chunk of > cycles doing unnecessary function calls and repeatedly gathering information. > > > It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more > optimized - TransactionIdIsInProgress() knows it doesn't need to check > anything before RecentXmin, but TransactionIdIsCurrentTransactionId() > doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is > smaller than GetTopTransactionIdIfAny() before bsearching through subxids. > > Of course anything along these lines would never be backpatchable... Agreed I've presented a simple patch intended for backpatching. You may not like the name, but please have another look at "just_remove_TransactionIdIsKnownCompleted_call.v1.patch". I believe it is backpatchable with minimal impact and without loss of performance. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-11 16:41:24 -0800, Andres Freund wrote: > FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench > r/w workload that's been modified to start 70 savepoints at the start shows > > pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: > t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table > "pgbench_branches" > pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: > t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table > "pgbench_branches" > pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: > t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table > "pgbench_branches" > > after a few minutes of running with a local, not slowed down, syncrep. Without > any other artifical slowdowns or such. And this can easily be triggered even without subtransactions, in a completely reliable way. The only reason I'm so far not succeeding in turning it into an isolationtester spec is that a transaction waiting for SyncRep doesn't count as waiting for isolationtester. Basically S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; S2: SELECT pg_xact_status($xid); S2: UPDATE; suffices, because the pg_xact_status() causes an xlog fetch, priming the xid cache, which then causes the TransactionIdIsInProgress() to take the early return path, despite the transaction still being in progress. Which then allows the update to proceed, despite the S1 not having "properly committed" yet. Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-10 22:11:38 -0800, Andres Freund wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench r/w workload that's been modified to start 70 savepoints at the start shows pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches" pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches" pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches" after a few minutes of running with a local, not slowed down, syncrep. Without any other artifical slowdowns or such. Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-11 13:48:59 +, Simon Riggs wrote: > On Fri, 11 Feb 2022 at 08:48, Simon Riggs > wrote: > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > increase the window between the TransactionIdCommitTree() and > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But > > > at > > > least it might make it easier to write tests exercising this scenario... > > > > Agreed > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > item cache is set too early in some cases. The single item cache is > > important for performance, so we just need to be more careful about > > setting the cache. > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > being set, but this fails! Not worked out why, yet. I don't think it's safe to check !TransactionIdKnownNotInProgress() in TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to do TransactionLogFetch() internally. > just_remove_TransactionIdIsKnownCompleted_call.v1.patch I think this might contain a different diff than what the name suggests? > just removes the known offending call, passes make check, but IMHO > leaves the same error just as likely by other callers. Which other callers are you referring to? To me it seems that the "real" reason behind this specific issue being nontrivial to fix and behind the frequent error of calling TransactionIdDidCommit() before TransactionIdIsInProgress() is that we've really made a mess out of the transaction status determination code / API. IMO the original sin is requiring the complicated if (TransactionIdIsCurrentTransactionId(xid) ... else if (TransactionIdIsInProgress(xid) ... else if (TransactionIdDidCommit(xid) ... dance at pretty much any place checking transaction status. The multiple calls for the same xid is, I think, what pushed the caching down to the TransactionLogFetch level. ISTM that we should introduce something like TransactionIdStatus(xid) that returns XACT_STATUS_CURRENT XACT_STATUS_IN_PROGRESS XACT_STATUS_COMMITTED XACT_STATUS_ABORTED accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks XidInMVCCSnapshot() instead of TransactionIdIsInProgress(). I think that'd also make visibility checks faster - we spend a good chunk of cycles doing unnecessary function calls and repeatedly gathering information. It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more optimized - TransactionIdIsInProgress() knows it doesn't need to check anything before RecentXmin, but TransactionIdIsCurrentTransactionId() doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is smaller than GetTopTransactionIdIfAny() before bsearching through subxids. Of course anything along these lines would never be backpatchable... Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
On Fri, 11 Feb 2022 at 08:48, Simon Riggs wrote: > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > Looks lik syncrep will make this a lot worse, because it can drastically > > increase the window between the TransactionIdCommitTree() and > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > least it might make it easier to write tests exercising this scenario... > > Agreed > > TransactionIdIsKnownCompleted(xid) is only broken because the single > item cache is set too early in some cases. The single item cache is > important for performance, so we just need to be more careful about > setting the cache. Something like this... fix_cachedFetchXid.v1.patch prevents the cache being set, but this fails! Not worked out why, yet. just_remove_TransactionIdIsKnownCompleted_call.v1.patch just removes the known offending call, passes make check, but IMHO leaves the same error just as likely by other callers. -- Simon Riggshttp://www.EnterpriseDB.com/ fix_cachedFetchXid.v1.patch Description: Binary data just_remove_TransactionIdIsKnownCompleted_call.v1.patch Description: Binary data
Re: Race condition in TransactionIdIsInProgress
On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... Agreed TransactionIdIsKnownCompleted(xid) is only broken because the single item cache is set too early in some cases. The single item cache is important for performance, so we just need to be more careful about setting the cache. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-10 21:56:09 -0800, Andres Freund wrote: > I think this may actually mean that the hot corruption problem fixed in > > commit 18b87b201f7 > Author: Andres Freund > Date: 2021-12-10 20:12:26 -0800 > > Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while > pruning. > > for 14/master is present in older branches too :(. Need to trace through the > HTSV and pruning logic with a bit more braincells than currently available to > be sure. On second thought, there's probably sufficiently more direct corruption this can cause than corruption via hot pruning. Not that that's not a problem, but ... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite broadly. Looks lik syncrep will make this a lot worse, because it can drastically increase the window between the TransactionIdCommitTree() and ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at least it might make it easier to write tests exercising this scenario... Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-10 15:21:27 -0500, Robert Haas wrote: > On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik > wrote: > > Postgres first records state transaction in CLOG, then removes > > transaction from procarray and finally releases locks. > > But it can happen that transaction is marked as committed in CLOG, > > XMAX_COMMITTED bit is set in modified tuple but > > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is > > uncommitted in tuple (%u,%u) to be updated in table" > > error is reported. > > This is exactly why the HeapTupleSatisfiesFOO() functions all test > TransactionIdIsInProgress() first and TransactionIdDidCommit() only if > it returns false. I don't think it's really legal to test > TransactionIdDidCommit() without checking TransactionIdIsInProgress() > first. And I bet that's exactly what this code is doing... The problem is that the TransactionIdDidAbort() call is coming from within TransactionIdIsInProgress(). If I understand correctly, the problem is that the xid cache stuff doesn't correctly work for subtransactions and breaks TransactionIdIsInProgress(). For the problem to occur you need an xid that is for a subtransaction that suboverflowed and that is 'clog committed' but not 'procarray committed'. If that xid is tested twice with TransactionIdIsInProgress(), you will get a bogus result on the second call within the same session. The first TransactionIdIsInProgress() will go to /* * Step 4: have to check pg_subtrans. * * At this point, we know it's either a subtransaction of one of the Xids * in xids[], or it's not running. If it's an already-failed * subtransaction, we want to say "not running" even though its parent may * still be running. So first, check pg_xact to see if it's been aborted. */ xc_slow_answer_inc(); if (TransactionIdDidAbort(xid)) return false; and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction didn't abort TransactionIdIsInProgress() will go onto the the following loop, see the toplevel xid is in progress in the procarray and return true. The second call to TransactionIdIsInProgress() will see that the fetched transaction matches the cached transactionid and return false. /* * We may have just checked the status of this transaction, so if it is * already known to be completed, we can fall out without any access to * shared memory. */ if (TransactionIdIsKnownCompleted(xid)) { xc_by_known_xact_inc(); return false; } Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit() before TransactionIdIsInProgress(). And other sessions will still return true. And even *this* session can return true again, if another transaction is checked that ends up caching another transaction status in cachedFetchXid. It looks to me like the TransactionIdIsKnownCompleted() path in TransactionIdIsInProgress() is just entirely broken. An prior TransactionLogFetch() can't be allowed to make subsequent TransactionIdIsInProgress() calls return wrong values. If I understand the problem correctly, a SELECT pg_xact_status(clog-committed-pgproc-in-progres) can make TransactionIdIsInProgress() return wrong values too. It's not just TransactionIdIsInProgress() itself. Looks like this problem goes back all the way back to commit 611b4393f22f2bb43135501cd6b7591299b6b453 Author: Tom Lane Date: 2008-03-11 20:20:35 + Make TransactionIdIsInProgress check transam.c's single-item XID status cache before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom. I think this may actually mean that the hot corruption problem fixed in commit 18b87b201f7 Author: Andres Freund Date: 2021-12-10 20:12:26 -0800 Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning. for 14/master is present in older branches too :(. Need to trace through the HTSV and pruning logic with a bit more braincells than currently available to be sure. Greetings, Andres Freund
Re: Race condition in TransactionIdIsInProgress
On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik wrote: > Postgres first records state transaction in CLOG, then removes > transaction from procarray and finally releases locks. > But it can happen that transaction is marked as committed in CLOG, > XMAX_COMMITTED bit is set in modified tuple but > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is > uncommitted in tuple (%u,%u) to be updated in table" > error is reported. This is exactly why the HeapTupleSatisfiesFOO() functions all test TransactionIdIsInProgress() first and TransactionIdDidCommit() only if it returns false. I don't think it's really legal to test TransactionIdDidCommit() without checking TransactionIdIsInProgress() first. And I bet that's exactly what this code is doing... -- Robert Haas EDB: http://www.enterprisedb.com
Re: Race condition in TransactionIdIsInProgress
On 2022-Feb-10, Andrey Borodin wrote: > > On 10 Feb 2022, at 21:46, Konstantin Knizhnik wrote: > > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in > > table" > > error is reported. > > Wow, cool, that seem to be a solution to one more mysterious > corruption thread - “reporting TID/table with corruption error" [0]. > Thank you! Ooh, great find. I'll have a look at this soon. Thanks for CCing me. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "I love the Postgres community. It's all about doing things _properly_. :-)" (David Garamond)
Re: Race condition in TransactionIdIsInProgress
> On 10 Feb 2022, at 21:46, Konstantin Knizhnik wrote: > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table" > error is reported. Wow, cool, that seem to be a solution to one more mysterious corruption thread - “reporting TID/table with corruption error" [0]. Thank you! Now it’s obvious that this is not a real data corruption. Maybe let’s remove corruption error code from the error? I had been literally woken at night by this code few times in January. And do you have a plan how to fix the actual issue? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/202108191637.oqyzrdtnheir%40alvherre.pgsql