Re: Race condition in TransactionIdIsInProgress

2022-06-26 Thread Heikki Linnakangas

On 25/06/2022 13:10, Simon Riggs wrote:

On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas  wrote:


On 24/06/2022 04:43, Andres Freund wrote:

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.


Makes sense.


This is what I came up with for master. One difference from Simon's
original patch is that I decided to not expose the
TransactionIdIsKnownNotInProgress(), as there are no other callers of it
in core, and it doesn't seem useful to extensions. I inlined it into the
caller instead.


Looks good, thanks.


Committed and backpatched. Thanks!

- Heikki




Re: Race condition in TransactionIdIsInProgress

2022-06-25 Thread Simon Riggs
On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas  wrote:
>
> On 24/06/2022 04:43, Andres Freund wrote:
> > On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> >> In summary, I think we should:
> >> - commit and backpatch Simon's
> >> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >> - fix pg_xact_status() to check TransactionIdIsInProgress()
> >> - in v15, remove TransationIdIsKnownCompleted function altogether
> >>
> >> I'll try to get around to that in the next few days, unless someone beats 
> >> me
> >> to it.
> >
> > Makes sense.
>
> This is what I came up with for master. One difference from Simon's
> original patch is that I decided to not expose the
> TransactionIdIsKnownNotInProgress(), as there are no other callers of it
> in core, and it doesn't seem useful to extensions. I inlined it into the
> caller instead.

Looks good, thanks.

> BTW, should we worry about XID wraparound with the cache? Could you have
> a backend sit idle for 2^32 transactions, without making any
> TransactionIdIsKnownNotInProgress() calls? That's not new with this
> patch, though, it could happen with the single-item cache in transam.c, too.

Yes, as a separate patch only in PG15+, imho.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Race condition in TransactionIdIsInProgress

2022-06-25 Thread Heikki Linnakangas

On 24/06/2022 04:43, Andres Freund wrote:

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.


Makes sense.


This is what I came up with for master. One difference from Simon's 
original patch is that I decided to not expose the 
TransactionIdIsKnownNotInProgress(), as there are no other callers of it 
in core, and it doesn't seem useful to extensions. I inlined it into the 
caller instead.


BTW, should we worry about XID wraparound with the cache? Could you have 
a backend sit idle for 2^32 transactions, without making any 
TransactionIdIsKnownNotInProgress() calls? That's not new with this 
patch, though, it could happen with the single-item cache in transam.c, too.


- HeikkiFrom de81639af2ce0ed635778d4f7cafb3dfc4fd84f5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Sat, 25 Jun 2022 12:02:39 +0300
Subject: [PATCH 1/1] Fix visibility check when XID is committed in CLOG but
 not in procarray.

TransactionIdIsInProgress had a fast path to return 'false' if the
single-item CLOG cache said that the transaction was known to be
committed. However, that was wrong, because a transaction is first
marked as committed in the CLOG but doesn't become visible to others
until it has removed its XID from the proc array. That could lead to an
error:

ERROR:  t_xmin is uncommitted in tuple to be updated

or for an UPDATE to go ahead without blocking, before the previous
UPDATE on the same row was made visible.

The window is usually very short, but synchronous replication makes it
much wider, because the wait for synchronous replica happens in that
window.

Another thing that makes it hard to hit is that it's hard to get such
a commit-in-progress transaction into the single item CLOG cache.
Normally, if you call TransactionIdIsInProgress on such a transaction,
it determines that the XID is in progress without checking the CLOG
and without populating the cache. One way to prime the cache is to
explicitly call pg_xact_status() on the XID. Another way is use a lot
of subtransactions, so that the subxid cache in the proc array is
overflown, making TransactionIdIsInProgress rely on CLOG checks.

This has been broken ever since it was introduced in 2008, but the race
condition is very hard to hit, especially without synchronous
replication.  There were a couple of reports of this starting from
summer 2021, but no one was able to find the root cause then.

TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
but I left in place in backbranches in case it's used by extensions.

Also change pg_xact_status() to check TransactionIdIsInProgress().
Previously, it only checked the CLOG, and returned "committed" before
the transaction was actually made visible to other queries. Note that
this also means that you cannot use pg_xact_status() to reproduce the
bug anymore, even if the code wasn't fixed.

Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
the pg_xact_status() change added by me.

Author: Simon Riggs
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
---
 src/backend/access/transam/transam.c | 27 -
 src/backend/storage/ipc/procarray.c  | 12 ++-
 src/backend/utils/adt/xid8funcs.c| 30 +++-
 src/include/access/transam.h |  1 -
 4 files changed, 23 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index dbc5f884e8..5865810135 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -219,33 +219,6 @@ TransactionIdDidAbort(TransactionId transactionId)
 	return false;
 }
 
-/*
- * TransactionIdIsKnownCompleted
- *		True iff transaction associated with the identifier is currently
- *		known to have either committed or aborted.
- *
- * This does NOT look into pg_xact but merely probes our local cache
- * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated heapam_visibility.c checks for the same XID.  If this isn't
- * extremely fast then it will be counterproductive.
- *
- * Note:
- *		Assumes transaction identifier is valid.
- */
-bool
-TransactionIdIsKnownCompleted(TransactionId transactionId)
-{
-	if (TransactionIdEquals(transactionId, cachedFetchXid))
-	{
-		/* If it's in the cache at all, it must be completed. */
-		return true;
-	}
-
-	return false;
-}
-
 /*
  * TransactionIdCommitTree
  *		Marks the given 

Re: Race condition in TransactionIdIsInProgress

2022-06-23 Thread Andres Freund
Hi,

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> On 12/02/2022 05:42, Andres Freund wrote:
> > The only reason I'm so far not succeeding in turning it into an
> > isolationtester spec is that a transaction waiting for SyncRep doesn't count
> > as waiting for isolationtester.
> >
> > Basically
> >
> > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
> > S2: SELECT pg_xact_status($xid);
> > S2: UPDATE;
> >
> > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
> > cache, which then causes the TransactionIdIsInProgress() to take the early
> > return path, despite the transaction still being in progress. Which then
> > allows the update to proceed, despite the S1 not having "properly committed"
> > yet.
>
> I started to improve isolationtester, to allow the spec file to specify a
> custom query to check for whether the backend is blocked. But then I
> realized that it's not quite enough: there is currently no way write the
> "$xid = txid_current()" step either. Isolationtester doesn't have variables
> like that. Also, you need to set synchronous_standby_names to make it block,
> which seems a bit weird in an isolation test.

I don't think we strictly need $xid - it likely can be implemented via
something like
  SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE 
application_name = 'testname/s1';


> I wish we had an explicit way to inject delays or failures. We discussed it
> before 
> (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com),
> but I don't want to pick up that task right now.

Understandable :(


> Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if
> this is worth committing, the pg_xact_status() trick is quite special for
> this particular bug.

Yea, not sure either.


> Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good
> to me. Replying to the discussion on that:
>
> On 12/02/2022 23:50, Andres Freund wrote:
> > On 2022-02-12 13:25:58 +, Simon Riggs wrote:
> > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> > > in backbranches.
> >
> > I think it'd be fine if we needed to. Or we could just make it always return
> > false or such.
> >
> >
> > > > > just removes the known offending call, passes make check, but IMHO
> > > > > leaves the same error just as likely by other callers.
> > > >
> > > > Which other callers are you referring to?
> > >
> > > I don't know of any, but we can't just remove a function in a
> > > backbranch, can we?
> >
> > We have in the past, if it's a "sufficiently internal" function.
>
> I think we should remove it in v15, and leave it as it is in backbranches.
> Just add a comment to point out that the name is a bit misleading, because
> it checks the clog rather than the proc array.  It's not inherently
> dangerous, and someone might have a legit use case for it. True, someone
> might also be doing this incorrect thing with it, but still seems like the
> right balance to me.
>
> I think we also need to change pg_xact_status(), to also call
> TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
> Currently, if it returns "committed", and you start a new transaction, the
> new transaction might not yet see the effects of that "committed"
> transaction. With that, you cannot reproduce the original bug with
> pg_xact_status() though, so there's no point in committing that test then.
>
> In summary, I think we should:
> - commit and backpatch Simon's
> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> - fix pg_xact_status() to check TransactionIdIsInProgress()
> - in v15, remove TransationIdIsKnownCompleted function altogether
>
> I'll try to get around to that in the next few days, unless someone beats me
> to it.

Makes sense.

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-06-23 Thread Heikki Linnakangas

On 12/02/2022 05:42, Andres Freund wrote:

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3853739 
is uncommitted in tuple (2,159) to be updated in table "pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3954305 
is uncommitted in tuple (2,58) to be updated in table "pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 4017908 
is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.


And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.


I started to improve isolationtester, to allow the spec file to specify 
a custom query to check for whether the backend is blocked. But then I 
realized that it's not quite enough: there is currently no way write the 
  "$xid = txid_current()" step either. Isolationtester doesn't have 
variables like that. Also, you need to set synchronous_standby_names to 
make it block, which seems a bit weird in an isolation test.


I wish we had an explicit way to inject delays or failures. We discussed 
it before 
(https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), 
but I don't want to pick up that task right now.


Anyway, I wrote a TAP test for this instead. See attached. I'm not sure 
if this is worth committing, the pg_xact_status() trick is quite special 
for this particular bug.


Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks 
good to me. Replying to the discussion on that:


On 12/02/2022 23:50, Andres Freund wrote:

On 2022-02-12 13:25:58 +, Simon Riggs wrote:

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.


I think it'd be fine if we needed to. Or we could just make it always return
false or such.



just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.


Which other callers are you referring to?


I don't know of any, but we can't just remove a function in a
backbranch, can we?


We have in the past, if it's a "sufficiently internal" function.


I think we should remove it in v15, and leave it as it is in 
backbranches. Just add a comment to point out that the name is a bit 
misleading, because it checks the clog rather than the proc array.  It's 
not inherently dangerous, and someone might have a legit use case for 
it. True, someone might also be doing this incorrect thing with it, but 
still seems like the right balance to me.


I think we also need to change pg_xact_status(), to also call 
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). 
Currently, if it returns "committed", and you start a new transaction, 
the new transaction might not yet see the effects of that "committed" 
transaction. With that, you cannot reproduce the original bug with 
pg_xact_status() though, so there's no point in committing that test then.


In summary, I think we should:
- commit and backpatch Simon's 
just_remove_TransactionIdIsKnownCompleted_call.v1.patch

- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone 
beats me to it.


- Heikkidiff --git a/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
new file mode 100644
index 00..8a651c0c60
--- /dev/null
+++ b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
@@ -0,0 +1,86 @@
+# Test for visibility checks, when an updating transaction has already
+# written commit record but has not yet removed its entry from the
+# proc array, and thus isn't yet visible to other transactions.
+#
+# In particular, this reproduced an old bug in that, see:
+# https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use 

Re: Race condition in TransactionIdIsInProgress

2022-02-12 Thread Andres Freund
Hi,

On 2022-02-12 13:25:58 +, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 19:08, Andres Freund  wrote:
> 
> > On 2022-02-11 13:48:59 +, Simon Riggs wrote:
> > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs  
> > > wrote:
> > > >
> > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
> > > >
> > > > > Looks lik syncrep will make this a lot worse, because it can 
> > > > > drastically
> > > > > increase the window between the TransactionIdCommitTree() and
> > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  
> > > > > But at
> > > > > least it might make it easier to write tests exercising this 
> > > > > scenario...
> > > >
> > > > Agreed
> > > >
> > > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > > item cache is set too early in some cases. The single item cache is
> > > > important for performance, so we just need to be more careful about
> > > > setting the cache.
> > >
> > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > > being set, but this fails! Not worked out why, yet.
> >
> > I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> > do TransactionLogFetch() internally.
> 
> That's not correct because you're confusing
> TransactionIdKnownNotInProgress(), which is a new function introduced
> by the patch, with the existing function
> TransactionIdIsKnownCompleted().

I don't think so. This call of the new TransactionIdKnownNotInProgress()

> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
>   return TRANSACTION_STATUS_ABORTED;
>   }
>  
> + /*
> +  * Safeguard that we have called TransactionIsIdInProgress() before
> +  * checking commit log manager, to ensure that we do not cache the
> +  * result until the xid is no longer in the procarray at eoxact.
> +  */
> + if (!TransactionIdKnownNotInProgress(transactionId))
> + return TRANSACTION_STATUS_IN_PROGRESS;
> +
>   /*
>* Get the transaction status.
>*/

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.


> > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >
> > I think this might contain a different diff than what the name suggests?
> 
> Not at all, please don't be confused by the name. The patch removes
> the call to TransactionIdIsKnownCompleted() from
> TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...


> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.


> > > just removes the known offending call, passes make check, but IMHO
> > > leaves the same error just as likely by other callers.
> >
> > Which other callers are you referring to?
> 
> I don't know of any, but we can't just remove a function in a
> backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.


Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-12 Thread Simon Riggs
On Fri, 11 Feb 2022 at 19:08, Andres Freund  wrote:

> On 2022-02-11 13:48:59 +, Simon Riggs wrote:
> > On Fri, 11 Feb 2022 at 08:48, Simon Riggs  
> > wrote:
> > >
> > > On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
> > >
> > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > increase the window between the TransactionIdCommitTree() and
> > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  
> > > > But at
> > > > least it might make it easier to write tests exercising this scenario...
> > >
> > > Agreed
> > >
> > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > item cache is set too early in some cases. The single item cache is
> > > important for performance, so we just need to be more careful about
> > > setting the cache.
> >
> > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > being set, but this fails! Not worked out why, yet.
>
> I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> do TransactionLogFetch() internally.

That's not correct because you're confusing
TransactionIdKnownNotInProgress(), which is a new function introduced
by the patch, with the existing function
TransactionIdIsKnownCompleted().


> > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
>
> I think this might contain a different diff than what the name suggests?

Not at all, please don't be confused by the name. The patch removes
the call to TransactionIdIsKnownCompleted() from
TransactionIdIsInProgress().

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.


> > just removes the known offending call, passes make check, but IMHO
> > leaves the same error just as likely by other callers.
>
> Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?


> To me it seems that the "real" reason behind this specific issue being
> nontrivial to fix and behind the frequent error of calling
> TransactionIdDidCommit() before TransactionIdIsInProgress() is
> that we've really made a mess out of the transaction status determination code
> / API. IMO the original sin is requiring the complicated
>
> if (TransactionIdIsCurrentTransactionId(xid)
> ...
> else if (TransactionIdIsInProgress(xid)
> ...
> else if (TransactionIdDidCommit(xid)
> ...
>
> dance at pretty much any place checking transaction status.

Agreed, it's pretty weird to have to call the functions in the right
order or you get the wrong answer. Especially since we have no
cross-check to ensure the correct sequence was followed.


> The multiple calls
> for the same xid is, I think, what pushed the caching down to the
> TransactionLogFetch level.
>
> ISTM that we should introduce something like TransactionIdStatus(xid) that
> returns
> XACT_STATUS_CURRENT
> XACT_STATUS_IN_PROGRESS
> XACT_STATUS_COMMITTED
> XACT_STATUS_ABORTED
> accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
> XidInMVCCSnapshot() instead of TransactionIdIsInProgress().
>
> I think that'd also make visibility checks faster - we spend a good chunk of
> cycles doing unnecessary function calls and repeatedly gathering information.
>
>
> It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
> optimized - TransactionIdIsInProgress() knows it doesn't need to check
> anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
> doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
> smaller than GetTopTransactionIdIfAny() before bsearching through subxids.
>
> Of course anything along these lines would never be backpatchable...

Agreed

I've presented a simple patch intended for backpatching. You may not
like the name, but please have another look at
"just_remove_TransactionIdIsKnownCompleted_call.v1.patch".
I believe it is backpatchable with minimal impact and without loss of
performance.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:
> FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
> r/w workload that's been modified to start 70 savepoints at the start shows
> 
> pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table 
> "pgbench_branches"
> pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table 
> "pgbench_branches"
> pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  
> t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table 
> "pgbench_branches"
> 
> after a few minutes of running with a local, not slowed down, syncrep. Without
> any other artifical slowdowns or such.

And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-10 22:11:38 -0800, Andres Freund wrote:
> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  
t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table 
"pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  
t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table 
"pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 
4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Andres Freund
Hi,

On 2022-02-11 13:48:59 +, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 08:48, Simon Riggs  
> wrote:
> >
> > On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
> >
> > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > increase the window between the TransactionIdCommitTree() and
> > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But 
> > > at
> > > least it might make it easier to write tests exercising this scenario...
> >
> > Agreed
> >
> > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > item cache is set too early in some cases. The single item cache is
> > important for performance, so we just need to be more careful about
> > setting the cache.
> 
> Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.


> just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?


> just removes the known offending call, passes make check, but IMHO
> leaves the same error just as likely by other callers.

Which other callers are you referring to?



To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status. The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.


It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Simon Riggs
On Fri, 11 Feb 2022 at 08:48, Simon Riggs  wrote:
>
> On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:
>
> > Looks lik syncrep will make this a lot worse, because it can drastically
> > increase the window between the TransactionIdCommitTree() and
> > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > least it might make it easier to write tests exercising this scenario...
>
> Agreed
>
> TransactionIdIsKnownCompleted(xid) is only broken because the single
> item cache is set too early in some cases. The single item cache is
> important for performance, so we just need to be more careful about
> setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch
just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


fix_cachedFetchXid.v1.patch
Description: Binary data


just_remove_TransactionIdIsKnownCompleted_call.v1.patch
Description: Binary data


Re: Race condition in TransactionIdIsInProgress

2022-02-11 Thread Simon Riggs
On Fri, 11 Feb 2022 at 06:11, Andres Freund  wrote:

> Looks lik syncrep will make this a lot worse, because it can drastically
> increase the window between the TransactionIdCommitTree() and
> ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 21:56:09 -0800, Andres Freund wrote:
> I think this may actually mean that the hot corruption problem fixed in
> 
> commit 18b87b201f7
> Author: Andres Freund 
> Date:   2021-12-10 20:12:26 -0800
> 
> Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while 
> pruning.
> 
> for 14/master is present in older branches too :(. Need to trace through the
> HTSV and pruning logic with a bit more braincells than currently available to
> be sure.

On second thought, there's probably sufficiently more direct corruption this
can cause than corruption via hot pruning. Not that that's not a problem, but
... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite
broadly.

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
least it might make it easier to write tests exercising this scenario...

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andres Freund
Hi,

On 2022-02-10 15:21:27 -0500, Robert Haas wrote:
> On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik  
> wrote:
> > Postgres first records state transaction in CLOG, then removes
> > transaction from procarray and finally releases locks.
> > But it can happen that transaction is marked as committed in CLOG,
> > XMAX_COMMITTED bit is set in modified tuple but
> > TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> > uncommitted in tuple (%u,%u) to be updated in table"
> > error is reported.
>
> This is exactly why the HeapTupleSatisfiesFOO() functions all test
> TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
> it returns false. I don't think it's really legal to test
> TransactionIdDidCommit() without checking TransactionIdIsInProgress()
> first. And I bet that's exactly what this code is doing...

The problem is that the TransactionIdDidAbort() call is coming from within
TransactionIdIsInProgress().

If I understand correctly, the problem is that the xid cache stuff doesn't
correctly work for subtransactions and breaks TransactionIdIsInProgress().

For the problem to occur you need an xid that is for a subtransaction that
suboverflowed and that is 'clog committed' but not 'procarray committed'.

If that xid is tested twice with TransactionIdIsInProgress(), you will get a
bogus result on the second call within the same session.

The first TransactionIdIsInProgress() will go to
/*
 * Step 4: have to check pg_subtrans.
 *
 * At this point, we know it's either a subtransaction of one of the 
Xids
 * in xids[], or it's not running.  If it's an already-failed
 * subtransaction, we want to say "not running" even though its parent 
may
 * still be running.  So first, check pg_xact to see if it's been 
aborted.
 */
xc_slow_answer_inc();

if (TransactionIdDidAbort(xid))
return false;

and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because
the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction
didn't abort TransactionIdIsInProgress() will go onto the the following loop,
see the toplevel xid is in progress in the procarray and return true.

The second call to TransactionIdIsInProgress() will see that the fetched
transaction matches the cached transactionid and return false.

/*
 * We may have just checked the status of this transaction, so if it is
 * already known to be completed, we can fall out without any access to
 * shared memory.
 */
if (TransactionIdIsKnownCompleted(xid))
{
xc_by_known_xact_inc();
return false;
}

Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit()
before TransactionIdIsInProgress(). And other sessions will still return
true. And even *this* session can return true again, if another transaction is
checked that ends up caching another transaction status in cachedFetchXid.


It looks to me like the TransactionIdIsKnownCompleted() path in
TransactionIdIsInProgress() is just entirely broken. An prior
TransactionLogFetch() can't be allowed to make subsequent
TransactionIdIsInProgress() calls return wrong values.


If I understand the problem correctly, a SELECT
pg_xact_status(clog-committed-pgproc-in-progres) can make
TransactionIdIsInProgress() return wrong values too. It's not just
TransactionIdIsInProgress() itself.


Looks like this problem goes back all the way back to

commit 611b4393f22f2bb43135501cd6b7591299b6b453
Author: Tom Lane 
Date:   2008-03-11 20:20:35 +

Make TransactionIdIsInProgress check transam.c's single-item XID status 
cache
before it goes groveling through the ProcArray.  In situations where the 
same
recently-committed transaction ID is checked repeatedly by tqual.c, this 
saves
a lot of shared-memory searches.  And it's cheap enough that it shouldn't
hurt noticeably when it doesn't help.
Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.


I think this may actually mean that the hot corruption problem fixed in

commit 18b87b201f7
Author: Andres Freund 
Date:   2021-12-10 20:12:26 -0800

Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while 
pruning.

for 14/master is present in older branches too :(. Need to trace through the
HTSV and pruning logic with a bit more braincells than currently available to
be sure.

Greetings,

Andres Freund




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Robert Haas
On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik  wrote:
> Postgres first records state transaction in CLOG, then removes
> transaction from procarray and finally releases locks.
> But it can happen that transaction is marked as committed in CLOG,
> XMAX_COMMITTED bit is set in modified tuple but
> TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
> uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

This is exactly why the HeapTupleSatisfiesFOO() functions all test
TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
it returns false. I don't think it's really legal to test
TransactionIdDidCommit() without checking TransactionIdIsInProgress()
first. And I bet that's exactly what this code is doing...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Alvaro Herrera
On 2022-Feb-10, Andrey Borodin wrote:

> > On 10 Feb 2022, at 21:46, Konstantin Knizhnik  wrote:
> > As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in 
> > table"
> > error is reported.
> 
> Wow, cool, that seem to be a solution to one more mysterious
> corruption thread - “reporting TID/table with corruption error" [0].
> Thank you!

Ooh, great find.  I'll have a look at this soon.  Thanks for CCing me.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)




Re: Race condition in TransactionIdIsInProgress

2022-02-10 Thread Andrey Borodin



> On 10 Feb 2022, at 21:46, Konstantin Knizhnik  wrote:
> As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
> error is reported.

Wow, cool, that seem to be a solution to one more mysterious corruption thread 
- “reporting TID/table with corruption error" [0]. Thank you!

Now it’s obvious that this is not a real data corruption. Maybe let’s remove 
corruption error code from the error? I had been literally woken at night by 
this code few times in January.

And do you have a plan how to fix the actual issue?


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/202108191637.oqyzrdtnheir%40alvherre.pgsql





Race condition in TransactionIdIsInProgress

2022-02-10 Thread Konstantin Knizhnik

Hi hackers,

Postgres first records state transaction in CLOG, then removes 
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG, 
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is 
uncommitted in tuple (%u,%u) to be updated in table"

error is reported.

TransactionIdIsInProgress first checks for cached XID status:

    /*
     * We may have just checked the status of this transaction, so if it is
     * already known to be completed, we can fall out without any access to
     * shared memory.
     */
    if (TransactionIdIsKnownCompleted(xid))
    {
    xc_by_known_xact_inc();
    return false;
    }

This status cache is updated by TransactionLogFetch.
So once transaction status is fetched from CLOG, subsequent invocation 
of TransactionIdIsKnownCompleted will return true

even if transaction is not removed from procarray.

The following stack illustrates how it can happen:

#6  0x55ab7758bbaf in TransactionLogFetch (transactionId=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88

#7  TransactionLogFetch (transactionId=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52
#8  0x55ab7758bc7d in TransactionIdDidAbort 
(transactionId=transactionId@entry=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186

#9  0x55ab77812a43 in TransactionIdIsInProgress (xid=4482)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587
#10 0x55ab7753de79 in HeapTupleSatisfiesVacuumHorizon 
(htup=htup@entry=0x7ffe562cec50, buffer=buffer@entry=1817,

    dead_after=dead_after@entry=0x7ffe562ceb14)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281
#11 0x55ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817, 
snapshot=0x7ffe562cec70, htup=0x7ffe562cec50)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449
#12 HeapTupleSatisfiesVisibility (tup=tup@entry=0x7ffe562cec50, 
snapshot=snapshot@entry=0x7ffe562cec70, buffer=buffer@entry=1817)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804
#13 0x55ab7752f3b5 in heap_hot_search_buffer 
(tid=tid@entry=0x7ffe562cec2a, relation=relation@entry=0x7fa57fbf17c8, 
buffer=buffer@entry=1817,
    snapshot=snapshot@entry=0x7ffe562cec70, 
heapTuple=heapTuple@entry=0x7ffe562cec50, all_dead=all_dead@entry=0x0, 
first_call=true)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802
#14 0x55ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8, 
delstate=)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480
#15 0x55ab7755767a in table_index_delete_tuples 
(delstate=0x7ffe562d0160, rel=0x2)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327
#16 _bt_delitems_delete_check (rel=rel@entry=0x7fa57fbf4518, 
buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8,

    delstate=delstate@entry=0x7ffe562d0160)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541
#17 0x55ab7754e475 in _bt_bottomupdel_pass 
(rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916, 
heapRel=heapRel@entry=0x7fa57fbf17c8, newitemsz=20)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406
#18 0x55ab7755010a in _bt_delete_or_dedup_one_page 
(rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640,

--Type  for more, q to quit, c to continue without paging--
    simpleonly=, checkingunique=, 
uniquedup=, indexUnchanged=true)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763
#19 0x55ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8, 
stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true,

    insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904
#20 _bt_doinsert (rel=rel@entry=0x7fa57fbf4518, 
itup=itup@entry=0x55ab7a1cde30, 
checkUnique=checkUnique@entry=UNIQUE_CHECK_YES,
    indexUnchanged=indexUnchanged@entry=true, 
heapRel=heapRel@entry=0x7fa57fbf17c8)
    at 
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255
#21 0x55ab7755a931 in btinsert (rel=0x7fa57fbf4518, 
values=, isnull=,