Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-25 Thread Robert Haas
On Wed, Aug 25, 2021 at 5:36 AM Greg Nancarrow  wrote:
> I've attached an updated patch, hopefully more along the lines that
> you were thinking of.

LGTM. Committed and back-patched to v10 and up. In theory the same bug
exists in 9.6, but you'd have to have third-party code using the
parallel context infrastructure in order to hit it. If the patch
back-patched cleanly I would have done so just in case, but
shm_toc_lookup lacks a bool noError option in that version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-25 Thread Greg Nancarrow
On Wed, Aug 25, 2021 at 1:37 AM Robert Haas  wrote:
>
> I guess I was thinking more of rejiggering things so that we save the
> results of each RestoreSnapshot() call in a local variable, e.g.
> asnapshot and tsnapshot. And then I think we could just
> RestoreTransactionSnapshot() on whichever one we want, and then
> PushActiveSnapshot(asnapshot) either way. I think it would be worth
> trying to move the PushActiveSnapshot() call out of the if statement
> instead it in two places, written differently but doing the same
> thing.
>

I've attached an updated patch, hopefully more along the lines that
you were thinking of.

Regards,
Greg Nancarrow
Fujitsu Australia


v11-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-24 Thread Robert Haas
On Tue, Aug 24, 2021 at 12:20 AM Greg Nancarrow  wrote:
> I initially thought this too, but RestoreTransactionSnapshot() sets up
> the resultant transaction snapshot in "CurrentSnapshot", which is
> static to snapmgr.c (like the other pointers to valid snapshots) and I
> didn't really want to mess with the visibility of that, to allow a
> call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I
> wasn't sure if it was safe to call GetTransactionSnapshot() here
> without the risk of unwanted side-effects - but, looking at it again,
> I think it is probably OK, so I did use it in my revised patch
> (attached) and removed
> that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree
> that it is OK to call GetTransactionSnapshot() here?

I guess I was thinking more of rejiggering things so that we save the
results of each RestoreSnapshot() call in a local variable, e.g.
asnapshot and tsnapshot. And then I think we could just
RestoreTransactionSnapshot() on whichever one we want, and then
PushActiveSnapshot(asnapshot) either way. I think it would be worth
trying to move the PushActiveSnapshot() call out of the if statement
instead it in two places, written differently but doing the same
thing.

> I agree, that is a better comment and detailed description, but didn't
> you mean "If the transaction isolation level is REPEATABLE READ or
> SERIALIZABLE ..."?

I sure did!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-23 Thread Greg Nancarrow
On Tue, Aug 24, 2021 at 5:00 AM Robert Haas  wrote:
>
>
> I think this looks pretty good. I am not sure I see any reason to
> introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
> just use RestoreTransactionSnapshot() and then call
> PushActiveSnapshot() from parallel.c? That seems preferable to me from
> the standpoint of avoiding multiplication of APIs.
>

I initially thought this too, but RestoreTransactionSnapshot() sets up
the resultant transaction snapshot in "CurrentSnapshot", which is
static to snapmgr.c (like the other pointers to valid snapshots) and I
didn't really want to mess with the visibility of that, to allow a
call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I
wasn't sure if it was safe to call GetTransactionSnapshot() here
without the risk of unwanted side-effects - but, looking at it again,
I think it is probably OK, so I did use it in my revised patch
(attached) and removed
that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree
that it is OK to call GetTransactionSnapshot() here?

> I also think that the comments should explain why we are doing this,
> rather than just that we are doing this. So instead of this:
>
> +/*
> + * If the transaction snapshot was serialized, restore it and restore the
> + * active snapshot too. Otherwise, the active snapshot is also installed 
> as
> + * the transaction snapshot.
> + */
>
> ...perhaps something like:
>
> If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
> the leader has serialized the transaction snapshot and we must restore
> it. At lower isolation levels, there is no transaction-lifetime
> snapshot, but we need TransactionXmin to get set to a value which is
> less than or equal to the xmin of every snapshot that will be used by
> this worker. The easiest way to accomplish that is to install the
> active snapshot as the transaction snapshot. Code running in this
> parallel worker might take new snapshots via GetTransactionSnapshot()
> or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
> snapshot older than the active snapshot.
>

I agree, that is a better comment and detailed description, but didn't
you mean "If the transaction isolation level is REPEATABLE READ or
SERIALIZABLE ..."?

since we have:

#define XACT_READ_UNCOMMITTED   0
#define XACT_READ_COMMITTED 1
#define XACT_REPEATABLE_READ2
#define XACT_SERIALIZABLE   3

#define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)


Regards,
Greg Nancarrow
Fujitsu Australia


v10-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-23 Thread Robert Haas
On Wed, Aug 18, 2021 at 9:28 AM Greg Nancarrow  wrote:
> Yes, I think I agree on that.
> I've updated the patch to restore the actual transaction snapshot in
> the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
> installed as the transaction snapshot.
> I've tested the patch for the different transaction isolation levels,
> and the reported coredump (from assertion failure) is not occurring.
> (In the "serializable" case there are "could not serialize access due
> to read/write dependencies among transactions" errors, as Pavel has
> previously reported, but these occur without the patch and it appears
> to be an unrelated issue)

I think this looks pretty good. I am not sure I see any reason to
introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
just use RestoreTransactionSnapshot() and then call
PushActiveSnapshot() from parallel.c? That seems preferable to me from
the standpoint of avoiding multiplication of APIs.

I also think that the comments should explain why we are doing this,
rather than just that we are doing this. So instead of this:

+/*
+ * If the transaction snapshot was serialized, restore it and restore the
+ * active snapshot too. Otherwise, the active snapshot is also installed as
+ * the transaction snapshot.
+ */

...perhaps something like:

If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
the leader has serialized the transaction snapshot and we must restore
it. At lower isolation levels, there is no transaction-lifetime
snapshot, but we need TransactionXmin to get set to a value which is
less than or equal to the xmin of every snapshot that will be used by
this worker. The easiest way to accomplish that is to install the
active snapshot as the transaction snapshot. Code running in this
parallel worker might take new snapshots via GetTransactionSnapshot()
or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
snapshot older than the active snapshot.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-18 Thread Greg Nancarrow
On Wed, Aug 18, 2021 at 5:00 AM Robert Haas  wrote:
>
> Ah ha! Thank you. So I think what I was missing here is that even
> though the transaction snapshot is not a well-defined concept when
> !IsolationUsesXactSnapshot(), we still need TransactionXmin to be set
> to a value that's earlier than any XID we might inquire about. So the
> proposal to install the leader's active snapshot as the worker's
> transaction snapshot is really just a way of making that happen. Now
> that I understand better, that seems OK to me when
> !IsolationUsesXactSnapshot(), but otherwise I think we need to
> serialize and restore the actual transaction snapshot. Do you agree?
>

Yes, I think I agree on that.
I've updated the patch to restore the actual transaction snapshot in
the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
installed as the transaction snapshot.
I've tested the patch for the different transaction isolation levels,
and the reported coredump (from assertion failure) is not occurring.
(In the "serializable" case there are "could not serialize access due
to read/write dependencies among transactions" errors, as Pavel has
previously reported, but these occur without the patch and it appears
to be an unrelated issue)


Regards,
Greg Nancarrow
Fujitsu Australia


v9-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-17 Thread Robert Haas
On Fri, Aug 13, 2021 at 2:52 AM Greg Nancarrow  wrote:
> With your proposed approach, what I'm seeing is that the worker calls
> GetTransactionSnapshot() at some point, which then builds a new
> snapshot, and results in increasing TransactionXmin (probably because
> another concurrent transaction has since completed). This snapshot is
> thus later than the snapshot used in the execution state of the query
> being executed. This causes the Assert in
> SubTransGetTopmostTransaction() to fire because the xid doesn't follow
> or equal the TransactionXmin value.

Ah ha! Thank you. So I think what I was missing here is that even
though the transaction snapshot is not a well-defined concept when
!IsolationUsesXactSnapshot(), we still need TransactionXmin to be set
to a value that's earlier than any XID we might inquire about. So the
proposal to install the leader's active snapshot as the worker's
transaction snapshot is really just a way of making that happen. Now
that I understand better, that seems OK to me when
!IsolationUsesXactSnapshot(), but otherwise I think we need to
serialize and restore the actual transaction snapshot. Do you agree?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-13 Thread Greg Nancarrow
On Thu, Aug 12, 2021 at 5:37 AM Robert Haas  wrote:
>
> 1. Then why doesn't the approach I proposed fix it?
>

I think that with your approach, it is not doing the expected
initialization done by SetTransactionSnapshot() (which is called by
RestoreTransactionSnapshot(), which your approach skips in the case of
the SQL script that reproduces the problem, because
IsolationUsesXactSnapshot() returns false for XACT_READ_COMMITTED).
There's some comments in SetTransactionSnapshot() explaining the
tricky parts of this initialization, testing that the source
transaction is still running, dealing with a race condition, and
setting up TransactionXmin.
Also, there's an "if (IsolationUsesXactSnapshot()) ..." block within
that function, doing some required setup for transaction-snapshot
mode, so it doesn't seem like a good idea to not call
RestoreTransactionSnapshot() if !IsolationUsesXactSnapshot(), as the
function is obviously catering for both cases, when the isolation
level does and doesn't use a transaction snapshot. So I think
SetTransactionSnapshot() always needs to be called.

With your proposed approach, what I'm seeing is that the worker calls
GetTransactionSnapshot() at some point, which then builds a new
snapshot, and results in increasing TransactionXmin (probably because
another concurrent transaction has since completed). This snapshot is
thus later than the snapshot used in the execution state of the query
being executed. This causes the Assert in
SubTransGetTopmostTransaction() to fire because the xid doesn't follow
or equal the TransactionXmin value.

> 2. Consider the case where the toplevel query is something like SELECT
> complexfunc() FROM generate_series(1,10) g -- in a case like this, I
> think complexfunc() can cause snapshots to be taken internally. For
> example suppose we end up inside exec_eval_simple_expr, or
> SPI_cursor_open_internal, in either case with read_only = false. Here
> we're going to again call GetTransactionSnapshot() and then execute a
> query which may use parallelism.
>
>

A query always uses the ActiveSnapshot at the time the QueryDesc is
created  - so as long as you don't (as the current code does) obtain a
potentially later snapshot and try to restore that in the worker as
the TransactionSnapshot (or let the worker create a new snapshot,
because no TransactionSnapshot was restored, which may have a greater
xmin than the ActiveSnapshot) then I think it should be OK.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-11 Thread Robert Haas
On Wed, Aug 11, 2021 at 8:32 AM Greg Nancarrow  wrote:
> This is explained by the TransactionSnapshot being a later snapshot in
> this case.
> So this is why it seems to be wrong to call GetTransactionSnapshot()
> in InitializeParallelDSM() and use a separate, potentially later,
> snapshot than that used in the execution state for the query.

Thanks for the research. I agree with your logic here, but:

1. Then why doesn't the approach I proposed fix it?

2. Consider the case where the toplevel query is something like SELECT
complexfunc() FROM generate_series(1,10) g -- in a case like this, I
think complexfunc() can cause snapshots to be taken internally. For
example suppose we end up inside exec_eval_simple_expr, or
SPI_cursor_open_internal, in either case with read_only = false. Here
we're going to again call GetTransactionSnapshot() and then execute a
query which may use parallelism.


-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-11 Thread Greg Nancarrow
On Tue, Aug 10, 2021 at 12:35 AM Robert Haas  wrote:
>
> Now there is evidently something wrong with this line of reasoning
> because the code is buggy and my proposed fix doesn't work, but I
> don't know what is wrong with it. You seem to think that it's crazy
> that we try to replicate the active snapshot to the active snapshot
> and the transaction snapshot to the transaction snapshot, but that
> did, and still does, seem really sane to me. The only part that now
> seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
> takes an *extra* snapshot, but since fixing that didn't fix the bug,
> there's evidently more to the problem than that.
>

I traced through snapshot processing during a parallel SELECT, up to
the point of the existing GetTransactionSnapshot() and
GetCurrentSnapshot() calls in InitializeParallelDSM().
I'm seeing the following sequence of calls, numbered below:


PortalStart():

   case PORTAL_ONE_SELECT:

(1)PushActiveSnapshot(GetTransactionSnapshot());

...

   queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
   portal->sourceText,
(2)   GetActiveSnapshot(),
   InvalidSnapshot,
   None_Receiver,
   params,
   portal->queryEnv,
   0);

...

(3)PopActiveSnapshot();


PortalRunSelect():

(4)PushActiveSnapshot(queryDesc->snapshot);
ExecutorRun(queryDesc, direction, (uint64) count,
 portal->run_once);

InitializeParallelDSM():

(5)Snapshot   transaction_snapshot = GetTransactionSnapshot();
(6)Snapshot   active_snapshot = GetActiveSnapshot();


 nprocessed = queryDesc->estate->es_processed;
(7) PopActiveSnapshot();



The snapshot used in execution of the query is clearly the
ActiveSnapshot at the time of creating the QueryDesc [at (2)] which is
a copy of the TransactionSnapshot originally acquired [at (1)].

In InitializeParallelDSM() it acquires both the TransactionSnapshot
[at (5)] and the ActiveSnapshot [at (6)], to be serialized in the DSM
for the workers (each of which will deserialize and restore these).
But the problem I see is that the GetTransactionSnapshot() call [at
(5)] may acquire a new snapshot (i.e. a later snapshot than the
ActiveSnapshot used in the execution of the query), for example, if a
concurrent transaction has completed since GetTransactionSnapshot()
was last called [in (1)].
In this case, GetTransactionSnapshot() calls GetSnapshotDataReuse()
and it returns false, causing a new snapshot to be built by
GetTransactionSnapshot().

curXactCompletionCount = ShmemVariableCache->xactCompletionCount;
if (curXactCompletionCount != snapshot->snapXactCompletionCount)
return false;

When this TransactionSnapshot is restored in a worker process, it
accordingly sets TransactionXmin, and if we look back at the coredump
stacktrace and the Assert condition that failed in the worker, we see
that the xid was expected to be >= TransactionXmin, but the Assert
fired because the xid was < TransactionXmin.

Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

This is explained by the TransactionSnapshot being a later snapshot in
this case.
So this is why it seems to be wrong to call GetTransactionSnapshot()
in InitializeParallelDSM() and use a separate, potentially later,
snapshot than that used in the execution state for the query.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-09 Thread Robert Haas
On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow  wrote:
> In setting up the snapshot for the execution state used in command
> execution, GetTransactionSnapshot() is called and (possibly a copy of)
> the returned snapshot is pushed as the ActiveSnapshot.

I mean, there are LOTS of PushActiveSnapshot() calls in the code. A
lot of those specifically say
PushActiveSnapshot(GetTransactionSnapshot()) but others are pushing
snapshots obtained from various other places. I really don't think it
can possibly be correct in general to assume that the snapshot on top
of the active snapshot stack is the same as the transaction snapshot.

> So why (current Postgres code, no patches applied) in setting up for
> parallel-worker execution (in InitializeParallelDSM) does the Postgres
> code then acquire ANOTHER TransactionSnapshot (by calling
> GetTransactionSnashot(), which could return CurrentSnapshot or a new
> snapshot) and serialize that, as well as serializing what the
> ActiveSnapshot points to, and then restore those in the workers as two
> separate snapshots? Is it a mistake? Or if intentional and correct,
> how so?

Well, I already agreed that in cases where GetTransactionSnapshot()
will acquire an altogether new snapshot, we shouldn't call it, but
beyond that I don't see why you think this is wrong. I mean, suppose
we only call GetTransactionSnapshot() at parallel worker when
IsolationUsesXactSnapshot(). In that case, CurrentSnapshot is a
durable, transaction-lifetime piece of backend-local state that can
affect the results of future calls to GetTransactionSnapshot(), and
therefore seems to need to be replicated to workers. Separately,
regardless of IsolationUsesXactSnapshot(), the active snapshot is
accessible via calls to GetActiveSnapshot() and therefore should also
be replicated to workers. Now, I don't know of any necessary reason
why those two things need to be the same, because again, there are
PushActiveSnapshot() calls all over the place, and they're not all
pushing the transaction snapshot. So therefore it makes sense to me
that we need to replicate those two things separately - the active
snapshot in the leader becomes the active snapshot in the workers and
the transaction snapshot in the leader becomes the transaction
snapshot in the worker.

Now there is evidently something wrong with this line of reasoning
because the code is buggy and my proposed fix doesn't work, but I
don't know what is wrong with it. You seem to think that it's crazy
that we try to replicate the active snapshot to the active snapshot
and the transaction snapshot to the transaction snapshot, but that
did, and still does, seem really sane to me. The only part that now
seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
takes an *extra* snapshot, but since fixing that didn't fix the bug,
there's evidently more to the problem than that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-08 Thread Greg Nancarrow
On Thu, Aug 5, 2021 at 12:03 PM Greg Nancarrow  wrote:
>

As the current v7 patch doesn't fix the coredump issue and also the
cfbot is now failing (as one of the regression tests fails) I'm
reinstating my v2/v5 patch (as v8) as the current best solution to
this issue.
So far I haven't found a test case (e.g. isolation level) that breaks
the patch. Pavel did encounter an error (but no crash) when testing
the patch with SERIALIZABLE, but he found that this error occurred
anyway without the patch.

Regards,
Greg Nancarrow
Fujitsu Australia


v8-0001-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 11:43 PM Robert Haas  wrote:
>
> Why do you think it's right to install the serialized *active*
> snapshot as the *transaction* snapshot? I've been operating on the
> presumption that we wanted the worker to install the leader's
> transaction snapshot as its transaction snapshot as its transaction
> snapshot and the leader's active snapshot as its active snapshot,
> because in my mind the active snapshot and the transaction snapshot
> are two separate things. However, there's clearly something wrong with
> that line of reasoning, because apparently your proposal fixes the
> problem and mine doesn't.
>

In setting up the snapshot for the execution state used in command
execution, GetTransactionSnapshot() is called and (possibly a copy of)
the returned snapshot is pushed as the ActiveSnapshot.
The way I see it is that there is essentially only one snapshot here,
the last-acquired TransactionSnapshot, which the ActiveSnapshot points
to (or a copy of it). Rightly or wrongly, my v2/v5 patch is basically
changing the code to do the same snapshot setup in each of the
parallel workers.

So why (current Postgres code, no patches applied) in setting up for
parallel-worker execution (in InitializeParallelDSM) does the Postgres
code then acquire ANOTHER TransactionSnapshot (by calling
GetTransactionSnashot(), which could return CurrentSnapshot or a new
snapshot) and serialize that, as well as serializing what the
ActiveSnapshot points to, and then restore those in the workers as two
separate snapshots? Is it a mistake? Or if intentional and correct,
how so?

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Robert Haas
On Tue, Aug 3, 2021 at 11:41 PM Greg Nancarrow  wrote:
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mind or correct additional issues
> you didn't foresee?

This is the sort of thing I was thinking about but I don't understand
why it doesn't fix the reported problem. Apparently I haven't
correctly understood what the issue is.

> >I don't know why Greg's patch is changing anything related to the
> >active snapshot (as opposed to the transaction snapshot). Maybe
> >there's a reason why we need that change, but I don't know what it is.
>
> I don't think my v2/v5 patch is changing anything related to the
> active snapshot (is it?).
> It's restoring the serialized active snapshot, installing it as the
> transaction snapshot and setting it as the active snapshot.

Why do you think it's right to install the serialized *active*
snapshot as the *transaction* snapshot? I've been operating on the
presumption that we wanted the worker to install the leader's
transaction snapshot as its transaction snapshot as its transaction
snapshot and the leader's active snapshot as its active snapshot,
because in my mind the active snapshot and the transaction snapshot
are two separate things. However, there's clearly something wrong with
that line of reasoning, because apparently your proposal fixes the
problem and mine doesn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
>
> At SERIALIZABLE level with v2/v5 I get an error which I don't have before
> the patch (but no crash):
> pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:
>  could not serialize access due to read/write dependencies among
> transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during
> conflict out checking.
>
I should correct myself: the mentioned error under SERIALIZABLE is also
present before the patch.


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
ср, 4 авг. 2021 г. в 14:18, Greg Nancarrow :

> On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov 
> wrote:
> >
> >>
> > Greg, thanks for the fast response! I suppose that a check for
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for
> the correct processing of a case with NULL transaction snapshot.
> > This corrects mentioned check-world test.
> > PFA v7 patch.
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>
> Now I've run my test script attached above in the thread on v6 and v7 and
quite soon got crashes with the Assert and a backtrace identical to the
original one. So it may be useful for further development, but now it is
not enough to fix the original crash.

And the same script run on v2/v5 patch was completed without crash at every
isolation level, I've tested i.e. READ COMMITTED, REPEATABLE READ and
SERIALIZABLE. If I remember correctly none of us could demonstrate any
errors with REPEATABLE READ and SERIALIZABLE on v2/v5. That fact was the
base of my proposal to commit v2/v5 i.e. to fix the obvious bug and let the
further improvements (if any) be potentially done later.

At SERIALIZABLE level with v2/v5 I get an error which I don't have before
the patch (but no crash):
pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:
 could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
conflict out checking.


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 8:17 PM Greg Nancarrow  wrote:
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>

Er, with the v7 patch, the problem still occurs (that Assert still
fires during a run of the SubTransGetTopmostTransaction-rep.sh
script).
Stracktrace from the coredump is below:

#0  0x7f06e5e7d37f in raise () from /lib64/libc.so.6
#1  0x7f06e5e67db5 in abort () from /lib64/libc.so.6
#2  0x00b06547 in ExceptionalCondition (
conditionName=0xba8d78 "TransactionIdFollowsOrEquals(xid,
TransactionXmin)", errorType=0xba8d0b "FailedAssertion",
fileName=0xba8d00 "subtrans.c", lineNumber=156) at assert.c:69
#3  0x00576f0f in SubTransGetTopmostTransaction (xid=3676) at
subtrans.c:156
#4  0x00b5f55a in XidInMVCCSnapshot (xid=3676,
snapshot=0x2e44560) at snapmgr.c:2293
#5  0x0050e014 in HeapTupleSatisfiesMVCC (htup=0x7ffdc6dee6f0,
snapshot=0x2e44560, buffer=15559)
at heapam_visibility.c:1070
#6  0x0050f148 in HeapTupleSatisfiesVisibility
(tup=0x7ffdc6dee6f0, snapshot=0x2e44560, buffer=15559)
at heapam_visibility.c:1771
#7  0x004f1d26 in heapgetpage (sscan=0x2e9a9a0, page=10310) at
heapam.c:466
#8  0x004f45cf in heapgettup_pagemode (scan=0x2e9a9a0,
dir=ForwardScanDirection, nkeys=0, key=0x0)
at heapam.c:1118
#9  0x004f4c91 in heap_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
at heapam.c:1396
#10 0x0077abb1 in table_scan_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
at ../../../src/include/access/tableam.h:1044
#11 0x0077ac79 in SeqNext (node=0x2e8f428) at nodeSeqscan.c:80
#12 0x0073d997 in ExecScanFetch (node=0x2e8f428,
accessMtd=0x77abe5 , recheckMtd=0x77ac8a )
at execScan.c:133


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov  wrote:
>
>>
> Greg, thanks for the fast response! I suppose that a check for 
> IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for 
> the correct processing of a case with NULL transaction snapshot.
> This corrects mentioned check-world test.
> PFA v7 patch.

Ah, thanks for that (I didn't debug that failure).
But is the coredump issue reproducible now? (using v7 and your test script)


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-04 Thread Pavel Borisov
ср, 4 авг. 2021 г. в 07:41, Greg Nancarrow :

> On Wed, Aug 4, 2021 at 3:21 AM Robert Haas  wrote:
> >
> >The idea I sort of had floating around in my mind is a little
> >different than what Greg has implemented. I was thinking that we could
> >just skip SerializeSnapshot and the corresponding shm_toc_allocate()
> >if !IsolationUsesXactSnapshot(). Then on the restore side we could
> >just call shm_toc_lookup() with noError = true and skip
> >RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.
>
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mind or correct additional issues
> you didn't foresee?
>
> t/001_pgbench_with_server.pl .. 10/?
> #   Failed test 'pgbench scale 1 initialization status (got 1 vs expected
> 0)'
> #   at t/001_pgbench_with_server.pl line 108.
> ...
> # creating primary keys...
> # pgbench: fatal: query failed: ERROR:  cannot take query snapshot
> during a parallel operation
> # CONTEXT:  parallel worker
> # pgbench: query was: alter table pgbench_accounts add primary key (aid)
>
> Greg, thanks for the fast response! I suppose that a check
for IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot
for the correct processing of a case with NULL transaction snapshot.
This corrects mentioned check-world test.
PFA v7 patch.


v7-0001-Correct-usage-of-transaction-snapshot-in-a-parall.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-03 Thread Greg Nancarrow
On Wed, Aug 4, 2021 at 3:21 AM Robert Haas  wrote:
>
>The idea I sort of had floating around in my mind is a little
>different than what Greg has implemented. I was thinking that we could
>just skip SerializeSnapshot and the corresponding shm_toc_allocate()
>if !IsolationUsesXactSnapshot(). Then on the restore side we could
>just call shm_toc_lookup() with noError = true and skip
>RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I've tried to follow your description and have attached a patch to
hopefully match it, but it doesn't pass "make check-world".
Perhaps I messed something up (apologies if so), or additional changes
are needed to match what you had in mind or correct additional issues
you didn't foresee?

t/001_pgbench_with_server.pl .. 10/?
#   Failed test 'pgbench scale 1 initialization status (got 1 vs expected 0)'
#   at t/001_pgbench_with_server.pl line 108.
...
# creating primary keys...
# pgbench: fatal: query failed: ERROR:  cannot take query snapshot
during a parallel operation
# CONTEXT:  parallel worker
# pgbench: query was: alter table pgbench_accounts add primary key (aid)


>I don't know why Greg's patch is changing anything related to the
>active snapshot (as opposed to the transaction snapshot). Maybe
>there's a reason why we need that change, but I don't know what it is.

I don't think my v2/v5 patch is changing anything related to the
active snapshot (is it?).
It's restoring the serialized active snapshot, installing it as the
transaction snapshot and setting it as the active snapshot.
The patch removes the additionally-acquired transaction snapshot in
InitializeParallelDSM (which seems like a later snapshot to that which
is actually being used in the execution state for the statement, back
in the leader, if I recall correctly). Basically, in the parallel
workers, I've tried to match the snapshot setup to that used in the
leader.
If the v2/v5 patch doesn't work correctly for some isolation level,
I've yet to find it (but can't absolutely rule out there's some case
not accounted for).

Regards,
Greg Nancarrow
Fujitsu Australia


v6-0001-RH-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-03 Thread Robert Haas
On Tue, Aug 3, 2021 at 9:59 AM Pavel Borisov  wrote:
> I've looked at v5 patch. It is completely similar to v2 patch, which I've 
> already tested using the workflow, I've described in the comments above.  
> Before the patch I get the errors quite soon, the patch corrects them. 
> Furthermore I've tested the same patch under REPEATABLE READ and SERIALIZABLE 
> and detected no flaws. So, now, when we've got Robert's explanation, it seems 
> to me that v2 (aka v5) patch can be committed (leaving possible REPEATABLE 
> READ and SERIALIZABLE improvements for the future). I don't really sure it is 
> still possible on 07/21 СF, but I'd change status of the patch to the 
> ready-for-committer. Also I'd like the bugfix to be backported to the 
> previous PG versions.

I agree that the fix should be back-ported, but I'm not keen to commit
anything unless it works for all isolation levels.

The idea I sort of had floating around in my mind is a little
different than what Greg has implemented. I was thinking that we could
just skip SerializeSnapshot and the corresponding shm_toc_allocate()
if !IsolationUsesXactSnapshot(). Then on the restore side we could
just call shm_toc_lookup() with noError = true and skip
RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I don't know why Greg's patch is changing anything related to the
active snapshot (as opposed to the transaction snapshot). Maybe
there's a reason why we need that change, but I don't know what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-08-03 Thread Pavel Borisov
пт, 23 июл. 2021 г. в 10:00, Greg Nancarrow :

> On Thu, Jul 22, 2021 at 2:15 AM Robert Haas  wrote:
> >
> > Thanks to Thomas Munro for drawing my attention to this thread. I
> > wasn't intentionally ignoring it, but there's a lot of email in the
> > world and only so much time.
> >
> > When I wrote this code originally, the idea that I had in mind was
> > simply this: whatever state we have in the leader ought to be
> > reproduced in each worker. So if there's an active snapshot in the
> > leader, we ought to make that active in all the workers, and if
> > there's a transaction snapshot in the leader, we ought to make that
> > the transaction snapshot in all of the workers.
> >
> > But I see now that my thinking was fuzzy, and I'm going to blame that
> > on the name GetTransactionSnapshot() being slightly misleading. If
> > IsolationUsesXactSnapshot() is true, then there's really such a thing
> > as a transaction snapshot and reproducing that in the worker is a
> > sensible thing to do. But when !IsolationUsesXactSnapshot(),
> > GetTransactionSnapshot() doesn't just "get the transaction snapshot",
> > because there isn't any such thing. It takes a whole new snapshot, on
> > the theory that you wouldn't be calling this function unless you had
> > finished up with the snapshot you got the last time you called this
> > function. And in the case of initiating parallel query, that is the
> > wrong thing.
> >
> > I think that, at least in the case where IsolationUsesXactSnapshot()
> > is true, we need to make sure that calling GetTransactionSnapshot() in
> > a worker produces the same result that it would have produced in the
> > leader. Say one of the workers calls an sql or plpgsql function and
> > that function runs a bunch of SQL statements. It seems to me that
> > there's probably a way for this to result in calls inside the worker
> > to GetTransactionSnapshot(), and if that doesn't return the same
> > snapshot as in the leader, then we've broken MVCC.
> >
> > What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
> > to just skip this altogether in that case. Certainly what we're doing
> > can't be right, because copying a snapshot that wouldn't have been
> > taken without parallel query can't ever be the right thing to do.
> > Perhaps we need to copy something else instead. I'm not really sure.
> >
> > So I think v2 is probably on the right track, but wrong when the
> > transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
> > and v4 just seem like unprincipled hacks that try to avoid the
> > assertion failure by lying about whether there's a problem.
> >
>
> Many thanks for taking time to respond to this (and thanks to Thomas Munro
> too).
> It's much appreciated, as this is a complex area.
> For the time being, I'll reinstate the v2 patch (say as v5) as a
> partial solution, and then work on addressing the REPEATABLE READ and
> SERIALIZABLE transaction isolation levels, which you point out are not
> handled correctly by the patch.
>
`
I've looked at v5 patch. It is completely similar to v2 patch, which I've
already tested using the workflow, I've described in the comments above.
Before the patch I get the errors quite soon, the patch corrects them.
Furthermore I've tested the same patch under REPEATABLE READ and
SERIALIZABLE and detected no flaws. So, now, when we've got Robert's
explanation, it seems to me that v2 (aka v5) patch can be committed
(leaving possible REPEATABLE READ and SERIALIZABLE improvements for the
future). I don't really sure it is still possible on 07/21 СF, but I'd
change status of the patch to the ready-for-committer. Also I'd like the
bugfix to be backported to the previous PG versions.

Further consideration on the patch and on the backporting is very much
welcome!

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-23 Thread Greg Nancarrow
On Thu, Jul 22, 2021 at 2:15 AM Robert Haas  wrote:
>
> Thanks to Thomas Munro for drawing my attention to this thread. I
> wasn't intentionally ignoring it, but there's a lot of email in the
> world and only so much time.
>
> When I wrote this code originally, the idea that I had in mind was
> simply this: whatever state we have in the leader ought to be
> reproduced in each worker. So if there's an active snapshot in the
> leader, we ought to make that active in all the workers, and if
> there's a transaction snapshot in the leader, we ought to make that
> the transaction snapshot in all of the workers.
>
> But I see now that my thinking was fuzzy, and I'm going to blame that
> on the name GetTransactionSnapshot() being slightly misleading. If
> IsolationUsesXactSnapshot() is true, then there's really such a thing
> as a transaction snapshot and reproducing that in the worker is a
> sensible thing to do. But when !IsolationUsesXactSnapshot(),
> GetTransactionSnapshot() doesn't just "get the transaction snapshot",
> because there isn't any such thing. It takes a whole new snapshot, on
> the theory that you wouldn't be calling this function unless you had
> finished up with the snapshot you got the last time you called this
> function. And in the case of initiating parallel query, that is the
> wrong thing.
>
> I think that, at least in the case where IsolationUsesXactSnapshot()
> is true, we need to make sure that calling GetTransactionSnapshot() in
> a worker produces the same result that it would have produced in the
> leader. Say one of the workers calls an sql or plpgsql function and
> that function runs a bunch of SQL statements. It seems to me that
> there's probably a way for this to result in calls inside the worker
> to GetTransactionSnapshot(), and if that doesn't return the same
> snapshot as in the leader, then we've broken MVCC.
>
> What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
> to just skip this altogether in that case. Certainly what we're doing
> can't be right, because copying a snapshot that wouldn't have been
> taken without parallel query can't ever be the right thing to do.
> Perhaps we need to copy something else instead. I'm not really sure.
>
> So I think v2 is probably on the right track, but wrong when the
> transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
> and v4 just seem like unprincipled hacks that try to avoid the
> assertion failure by lying about whether there's a problem.
>

Many thanks for taking time to respond to this (and thanks to Thomas Munro too).
It's much appreciated, as this is a complex area.
For the time being, I'll reinstate the v2 patch (say as v5) as a
partial solution, and then work on addressing the REPEATABLE READ and
SERIALIZABLE transaction isolation levels, which you point out are not
handled correctly by the patch.

Regards,
Greg Nancarrow
Fujitsu Australia


v5-0001-PG15-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-21 Thread Robert Haas
On Wed, Jul 14, 2021 at 10:34 PM Greg Nancarrow  wrote:
> Unfortunately there is currently no test, code-comment, README or
> developer-discussion that definitively determines which approach (v2
> vs  v3/v4) is a valid fix for this issue.
> We don't know if having both the transaction and active snapshots in a
> parallel worker is intentional or not, and if so, why so?
> (certainly in the non-parallel case of the same statement execution,
> there is only one snapshot in question here - the obtained transaction
> snapshot is pushed as the active snapshot, as it is done in 95% of
> cases in the code)
> It seems that only the original code authors know how the snapshot
> handling in parallel-workers is MEANT to work, and they have yet to
> speak up about it here.
> At this point, we can only all agree that there is a problem to be fixed here.

Hi.

Thanks to Thomas Munro for drawing my attention to this thread. I
wasn't intentionally ignoring it, but there's a lot of email in the
world and only so much time.

When I wrote this code originally, the idea that I had in mind was
simply this: whatever state we have in the leader ought to be
reproduced in each worker. So if there's an active snapshot in the
leader, we ought to make that active in all the workers, and if
there's a transaction snapshot in the leader, we ought to make that
the transaction snapshot in all of the workers.

But I see now that my thinking was fuzzy, and I'm going to blame that
on the name GetTransactionSnapshot() being slightly misleading. If
IsolationUsesXactSnapshot() is true, then there's really such a thing
as a transaction snapshot and reproducing that in the worker is a
sensible thing to do. But when !IsolationUsesXactSnapshot(),
GetTransactionSnapshot() doesn't just "get the transaction snapshot",
because there isn't any such thing. It takes a whole new snapshot, on
the theory that you wouldn't be calling this function unless you had
finished up with the snapshot you got the last time you called this
function. And in the case of initiating parallel query, that is the
wrong thing.

I think that, at least in the case where IsolationUsesXactSnapshot()
is true, we need to make sure that calling GetTransactionSnapshot() in
a worker produces the same result that it would have produced in the
leader. Say one of the workers calls an sql or plpgsql function and
that function runs a bunch of SQL statements. It seems to me that
there's probably a way for this to result in calls inside the worker
to GetTransactionSnapshot(), and if that doesn't return the same
snapshot as in the leader, then we've broken MVCC.

What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
to just skip this altogether in that case. Certainly what we're doing
can't be right, because copying a snapshot that wouldn't have been
taken without parallel query can't ever be the right thing to do.
Perhaps we need to copy something else instead. I'm not really sure.

So I think v2 is probably on the right track, but wrong when the
transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
and v4 just seem like unprincipled hacks that try to avoid the
assertion failure by lying about whether there's a problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-14 Thread Greg Nancarrow
On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
 wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>

Unfortunately there is currently no test, code-comment, README or
developer-discussion that definitively determines which approach (v2
vs  v3/v4) is a valid fix for this issue.
We don't know if having both the transaction and active snapshots in a
parallel worker is intentional or not, and if so, why so?
(certainly in the non-parallel case of the same statement execution,
there is only one snapshot in question here - the obtained transaction
snapshot is pushed as the active snapshot, as it is done in 95% of
cases in the code)
It seems that only the original code authors know how the snapshot
handling in parallel-workers is MEANT to work, and they have yet to
speak up about it here.
At this point, we can only all agree that there is a problem to be fixed here.

My concern with the v3/v4 patch approach is that because the
parallel-workers use a later snapshot to what is actually used in the
execution context for the statement in the parallel leader, then it is
possible for the parallel leader and parallel workers to have
different transaction visibility, and surely this cannot be correct.
For example, suppose a transaction that deletes a row, completes in
the window between these two snapshots.
Couldn't the row be visible to the parallel workers but not to the
parallel leader?
My guess is that currently there are not enough
concurrent-transactions tests to expose such a problem, and the window
here is fairly small.

So we can fiddle xmin values to avoid the immediate Assert issue here,
but it's not addressing potential xmax-related issues.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-13 Thread Pavel Borisov
>
> > So yeah, I think this is due to confusion with two snapshots and
> > failing
> > to consider both of them when calculating TransactionXmin.
> >
> > But I think removing one of the snapshots (as the v2 does it) is rather
> > strange too. I very much doubt having both the transaction and active
> > snapshots in the parallel worker is not intentional, and Pavel may very
> > well be right this breaks isolation levels that use the xact snapshot
> > (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
> >
> > So I think we need to keep both snapshots, but fix TransactionXmin to
> > consider both of them - I suppose ParallelWorkerMain could simply look
> > at the two snapshots, and use the minimum. Which is what [1] (already
> > linked by Pavel) talks about, although that's related to concerns about
> > one of the processes dying, which is not what's happening here.
> >
> >
> > I'm wondering what consequences this may have on production systems,
> > though. We've only seen this failing because of the assert, so what
> > happens when the build has asserts disabled?

> Looking at SubTransGetTopmostTransaction, it seems the while loop ends
> > immediately (because it's pretty much the opposite of the assert), so
> > we
> > just return the "xid" as topmost XID. But will that produce incorrect
> > query results, or what?
>
I haven't seen anything incorrect on production systems with asserts turned
off, by I should note this assertion is not immediately reproduced so it is
not that easy to catch the possible logical inconsequences of parallel scan
results. As I've noticed that subxids cache is used in most cases and even
this assertion is also rare in parallel scans. Maybe that is why we lived
with this bug unnoticed so long.

/www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de
> 
>
> PFA v4 patch based on the ideas above.
>
> In principle I see little difference with v3. But I agree it is more
> correct.
>
> I did test this patch on Linux and MacOS using testing algo above and
> got no error. On master branch before the patch I still see this error.
>

I've tested the patch and the error doesn't appear anymore.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-13 Thread Maxim Orlov

On 2021-07-09 20:36, Tomas Vondra wrote:

Hi,

I took a quick look on this - I'm no expert in the details of 
snapshots,

so take my comments with a grain of salt.

AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or correct)
solution, for a couple reasons:


1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
bogus value, this only fixes it locally in 
SubTransGetTopmostTransaction

but I'd be surprised if other places did not have the same issue.


2) The xid/TransactionXmin comparison in the v2 fix:

  xid = xid > TransactionXmin ? xid : TransactionXmin;

seems broken, because it compares the XIDs directly, but that's not
going to work after generating enough transactions.


3) But even if this uses TransactionIdFollowsOrEquals, it seems very
strange because the "xid" comes from

  XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

i.e. it's the raw xmin from the tuple, so why should we be setting it 
to

TransactionXmin? That seems pretty strange, TBH.


So yeah, I think this is due to confusion with two snapshots and 
failing

to consider both of them when calculating TransactionXmin.

But I think removing one of the snapshots (as the v2 does it) is rather
strange too. I very much doubt having both the transaction and active
snapshots in the parallel worker is not intentional, and Pavel may very
well be right this breaks isolation levels that use the xact snapshot
(i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.

So I think we need to keep both snapshots, but fix TransactionXmin to
consider both of them - I suppose ParallelWorkerMain could simply look
at the two snapshots, and use the minimum. Which is what [1] (already
linked by Pavel) talks about, although that's related to concerns about
one of the processes dying, which is not what's happening here.


I'm wondering what consequences this may have on production systems,
though. We've only seen this failing because of the assert, so what
happens when the build has asserts disabled?

Looking at SubTransGetTopmostTransaction, it seems the while loop ends
immediately (because it's pretty much the opposite of the assert), so 
we

just return the "xid" as topmost XID. But will that produce incorrect
query results, or what?



regards


[1]
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de


PFA v4 patch based on the ideas above.

In principle I see little difference with v3. But I agree it is more 
correct.


I did test this patch on Linux and MacOS using testing algo above and 
got no error. On master branch before the patch I still see this error.


---
Best regards,
Maxim Orlov.diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3550ef13baa..6443d492501 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -34,6 +34,7 @@
 #include "pgstat.h"
 #include "storage/ipc.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -45,6 +46,7 @@
 #include "utils/snapmgr.h"
 #include "utils/typcache.h"
 
+
 /*
  * We don't want to waste a lot of memory on an error queue which, most of
  * the time, will process only a handful of small messages.  However, it is
@@ -1261,6 +1263,7 @@ ParallelWorkerMain(Datum main_arg)
 	char	   *uncommittedenumsspace;
 	StringInfoData msgbuf;
 	char	   *session_dsm_handle_space;
+	Snapshot	asnap;
 
 	/* Set flag to indicate that we're initializing a parallel worker. */
 	InitializingParallelWorker = true;
@@ -1415,7 +1418,15 @@ ParallelWorkerMain(Datum main_arg)
 
 	/* Restore active snapshot. */
 	asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false);
-	PushActiveSnapshot(RestoreSnapshot(asnapspace));
+	asnap = RestoreSnapshot(asnapspace)
+	PushActiveSnapshot(asnap);
+
+	/*
+	 * We may have different xmins in active and transaction snapshots in
+	 * parallel workers. So, use minimum for TransactionXmin.
+	 */
+	if (TransactionIdFollows(TransactionXmin, asnap->xmin))
+		ProcArrayInstallRestoredXmin(p->xmin, fps->parallel_leader_pgproc);
 
 	/*
 	 * We've changed which tuples we can see, and must therefore invalidate


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-09 Thread Tomas Vondra
Hi,

I took a quick look on this - I'm no expert in the details of snapshots,
so take my comments with a grain of salt.

AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or correct)
solution, for a couple reasons:


1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
bogus value, this only fixes it locally in SubTransGetTopmostTransaction
but I'd be surprised if other places did not have the same issue.


2) The xid/TransactionXmin comparison in the v2 fix:

  xid = xid > TransactionXmin ? xid : TransactionXmin;

seems broken, because it compares the XIDs directly, but that's not
going to work after generating enough transactions.


3) But even if this uses TransactionIdFollowsOrEquals, it seems very
strange because the "xid" comes from

  XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

i.e. it's the raw xmin from the tuple, so why should we be setting it to
TransactionXmin? That seems pretty strange, TBH.


So yeah, I think this is due to confusion with two snapshots and failing
to consider both of them when calculating TransactionXmin.

But I think removing one of the snapshots (as the v2 does it) is rather
strange too. I very much doubt having both the transaction and active
snapshots in the parallel worker is not intentional, and Pavel may very
well be right this breaks isolation levels that use the xact snapshot
(i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.

So I think we need to keep both snapshots, but fix TransactionXmin to
consider both of them - I suppose ParallelWorkerMain could simply look
at the two snapshots, and use the minimum. Which is what [1] (already
linked by Pavel) talks about, although that's related to concerns about
one of the processes dying, which is not what's happening here.


I'm wondering what consequences this may have on production systems,
though. We've only seen this failing because of the assert, so what
happens when the build has asserts disabled?

Looking at SubTransGetTopmostTransaction, it seems the while loop ends
immediately (because it's pretty much the opposite of the assert), so we
just return the "xid" as topmost XID. But will that produce incorrect
query results, or what?



regards


[1]
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-24 Thread Pavel Borisov
By the way in the initial discussion on parallel infrastructure
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de
I've seen a proposal to set the worker's PGXACT->xmin
to be the minimum of the top transaction id and the snapshots.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-24 Thread Pavel Borisov
>
> Have you found a case that the v2 patch breaks?
>
> The v2 patch does follow the analysis in the beginning of the
> discussion, which identified that in setting up parallel workers, a
> "later transaction snapshot" was taken than the one actually used in
> the statement execution, and that's what eventually leads to the
> observed Assertion failure.
> The original problem reporter stated: "So the root cause is the
> Parallel Workers process set the TransactionXmin with later
> transcation snapshot".
> Also, as far as I can see, it makes no sense to pass parallel workers
> both an active snapshot and a (later) transaction snapshot. In the
> leader, prior to execution and running parallel workers, a transaction
> snapshot is obtained and pushed as the active snapshot (note: only ONE
> snapshot at this point). It makes little sense to me to then obtain
> ANOTHER transaction snapshot when setting up parallel workers, and
> pass that (only) to the workers along with the original (earlier)
> active snapshot. (If there is a reason for passing the TWO snapshots
> to parallel workers, original code authors and/or snapshot experts
> should speak up now, and code comments should be added accordingly to
> explain the purpose and how it is MEANT to work)
> This is why the v2 patch updates the code to just pass one snapshot
> (the active snapshot) to the parallel workers, which gets restored in
> each worker as the transaction snapshot and set as the active snapshot
> (so is then consistent with the snapshot setup in the parallel
> leader).
>

I fully agree that _probably_ two snapshots are not needed for read-only
transactions. But I think we need to be very careful with everything
related to transaction integrity as there are different isolation levels
etc. The only fact I can not break something, seems like not an argument
solid enough.

Robert could you give us some help and insight into why there is a need for
the separate active and transaction snapshots in the parallel
infrastructure patch as per your commit 924bcf4f16d ?

Until then just a workaround like v3 patch seems safer for me (still I am
not a very big specialist in this).

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-22 Thread Greg Nancarrow
On Wed, Jun 23, 2021 at 1:06 AM Maxim Orlov  wrote:
>
> The analysis in the beginning of the discussion seems to be right, but
> the fix v2 looks too invasive for me.
>
> Personally, I'd like not to remove snapshot even if transaction is
> read-only. I propose to consider "xid < TransactionXmin" as a legit case
> and just promote xid to TransactionXmin.
>
> It's annoying this old bug still not fixed. What do you think?


This v3 patch doesn't look right to me at all.
It's not addressing the fundamental problem, it just seems to be
working around it, by fiddling a xid value to avoid an assertion
failure.

I really can't see how the v2 patch "removes a snapshot" in the
read-only transaction case, and is "invasive".
Have you found a case that the v2 patch breaks?

The v2 patch does follow the analysis in the beginning of the
discussion, which identified that in setting up parallel workers, a
"later transaction snapshot" was taken than the one actually used in
the statement execution, and that's what eventually leads to the
observed Assertion failure.
The original problem reporter stated: "So the root cause is the
Parallel Workers process set the TransactionXmin with later
transcation snapshot".
Also, as far as I can see, it makes no sense to pass parallel workers
both an active snapshot and a (later) transaction snapshot. In the
leader, prior to execution and running parallel workers, a transaction
snapshot is obtained and pushed as the active snapshot (note: only ONE
snapshot at this point). It makes little sense to me to then obtain
ANOTHER transaction snapshot when setting up parallel workers, and
pass that (only) to the workers along with the original (earlier)
active snapshot. (If there is a reason for passing the TWO snapshots
to parallel workers, original code authors and/or snapshot experts
should speak up now, and code comments should be added accordingly to
explain the purpose and how it is MEANT to work)
This is why the v2 patch updates the code to just pass one snapshot
(the active snapshot) to the parallel workers, which gets restored in
each worker as the transaction snapshot and set as the active snapshot
(so is then consistent with the snapshot setup in the parallel
leader).
The v3 patch doesn't address any of this at all. It is just checking
if the xid is < the TransactionXMin in the snapshot, and if so,
setting the xid to be TransactionXMin, and thus avoiding the Assertion
failure. But that TransactionXMin was from the "later transaction
snapshot", which was only obtained for the workers and doesn't
necessarily match the earlier active snapshot taken, that is
associated with the actual statement execution. As far as I am
concerned, that "later transaction snapshot", taken for use by the
workers, doesn't seem right, so that's why the v2 patch removes the
use of it and only uses the active snapshot, for consistency with the
rest of the code. I think that if there was something fundamentally
wrong with the v2 patch's removal of that "later transaction snapshot"
used by the parallel workers, then surely there would be a test
failure somewhere - but that is not the case.
I think the v2 patch should be restored as the proposed solution here.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-22 Thread Maxim Orlov
The analysis in the beginning of the discussion seems to be right, but 
the fix v2 looks too invasive for me.


Personally, I'd like not to remove snapshot even if transaction is 
read-only. I propose to consider "xid < TransactionXmin" as a legit case 
and just promote xid to TransactionXmin.


It's annoying this old bug still not fixed. What do you think?
---
Best regards,
Maxim Orlov.diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f894..ce7c90aa38f 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/slru.h"
+#include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "pg_trace.h"
@@ -152,6 +153,21 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	TransactionId parentXid = xid,
 previousXid = xid;
 
+	/*
+	 * We may have different xmins in active and transaction snapshots in
+	 * parallel workers. And TransactionXmin can be promoted to the max of them.
+	 * So, we do this to avoid assert below when restore active snapshot with
+	 * less xmin value.
+	 *
+	 * XXX: the same may be fixed by prohibiting read-only transactions (like
+	 * parallel scans) to increment TransactionXmin.
+	 */
+	if (IsParallelWorker())
+	{
+		xid = xid > TransactionXmin ? xid : TransactionXmin;
+		parentXid = xid;
+		previousXid = xid;
+	}
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-04 Thread Ranier Vilela
Em sex., 4 de jun. de 2021 às 12:07, Pavel Borisov 
escreveu:

> Please, avoid using decimal based values.
>>
>> 128 is multiple of 64.
>>
> It's true that 128 is better to use than 120 but the main problem is not
> in the value but in the fact we never get
> CurrentRunningXacts->subxid_overflow = suboverflowed; with value more
> than 120. This solves the problem but it doesn't seem the right way to fix
> the issue.
>
It seems to me a solution too.

Instead it's better to process suboverflowed state which is legit itself
> not resulting getting the crash on the Assert.
>
Of course it would be great to find the root of the problem.


> So the discussion of "better" value doesn't seem related to the problem.
> It is for demonstration only.
>
IMHO, you could propose a patch, documenting this whole situation and
proposing this workaround.
I've been studying commits, and on several occasions problems have been
fixed like this.
But important is the documentation of the problem.

best regards,
Ranier Vilela


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-04 Thread Pavel Borisov
>
> Please, avoid using decimal based values.
>
> 128 is multiple of 64.
>
It's true that 128 is better to use than 120 but the main problem is not in
the value but in the fact we never get
CurrentRunningXacts->subxid_overflow = suboverflowed; with value more than
120. This solves the problem but it doesn't seem the right way to fix the
issue. Instead it's better to process suboverflowed state which is legit
itself  not resulting getting the crash on the Assert. So the discussion of
"better" value doesn't seem related to the problem. It is for demonstration
only.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-04 Thread Ranier Vilela
>Just a note here. After examining the core dump I did notice something.

>While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true
>although subxip == NULL and subxcnt == 0. As far as I understand,
>snapshot->suboverflowed is set true in the GetRunningTransactionData
>call.

>And then I decided to put elog around CurrentRunningXacts->subxcnt's
>assigment.
>diff --git a/src/backend/storage/ipc/procarray.c
>b/src/backend/storage/ipc/procarray.c
>index 42a89fc5dc9..3d2db02f580 100644
>--- a/src/backend/storage/ipc/procarray.c
>+++ b/src/backend/storage/ipc/procarray.c
>@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
> * increases if slots do.
> */

>+   if (suboverflowed)
>+   elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow
>is true");
>+
>CurrentRunningXacts->xcnt = count - subcount;
>CurrentRunningXacts->subxcnt = subcount;
>CurrentRunningXacts->subxid_overflow = suboverflowed;

>... and did get a bunch of messages. I.e. subxid_overflow is set true
>very often.

>I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes
>more than 120 there are no messages and no failed assertions are
>provided any more.

Please, avoid using decimal based values.

128 is multiple of 64.

See :

https://github.com/trevstanhope/scratch/blob/master/C/docs/O%27Reilly%20-%20Practical%20C%20Programming%203rd%20Edition.pdf

15.6.1 The Power of Powers of 2

regards,

Ranier Vilela


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-04 Thread Maxim Orlov

Just a note here. After examining the core dump I did notice something.

While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true 
although subxip == NULL and subxcnt == 0. As far as I understand, 
snapshot->suboverflowed is set true in the GetRunningTransactionData 
call.


And then I decided to put elog around CurrentRunningXacts->subxcnt's 
assigment.
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c

index 42a89fc5dc9..3d2db02f580 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
 * increases if slots do.
 */

+   if (suboverflowed)
+   elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow 
is true");

+
CurrentRunningXacts->xcnt = count - subcount;
CurrentRunningXacts->subxcnt = subcount;
CurrentRunningXacts->subxid_overflow = suboverflowed;

... and did get a bunch of messages. I.e. subxid_overflow is set true 
very often.


I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes 
more than 120 there are no messages and no failed assertions are 
provided any more.


---
Best regards,
Maxim Orlov.diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc9..3d2db02f580 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
 	 * increases if slots do.
 	 */
 
+	if (suboverflowed)
+		elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow is true");
+
 	CurrentRunningXacts->xcnt = count - subcount;
 	CurrentRunningXacts->subxcnt = subcount;
 	CurrentRunningXacts->subxid_overflow = suboverflowed;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index be67d8a8616..8e1a3d1c6bb 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -33,7 +33,7 @@
  * listed anywhere in the PGPROC array is not a running transaction.  Else we
  * have to look at pg_subtrans.
  */
-#define PGPROC_MAX_CACHED_SUBXIDS 64	/* XXX guessed-at value */
+#define PGPROC_MAX_CACHED_SUBXIDS 120	/* XXX guessed-at value */
 
 typedef struct XidCacheStatus
 {


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-24 Thread Greg Nancarrow
On Mon, May 24, 2021 at 11:56 PM Pavel Borisov  wrote:
>
> Using a recipe similar to what has been described above in the thread, I 
> reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
> 1. Make & make install
> 2. Make check
> 3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code dir.
>

Thanks, using the latest PG14 code
(713a431c781fbfe1a22fae4991836077f0f4c513), with no modifications, and
building with "--enable-cassert", your script reproduced the coredump
on my system in about 10 minutes, on the 1st run.

#0  0x7f1542d1137f in raise () from /lib64/libc.so.6
#1  0x7f1542cfbdb5 in abort () from /lib64/libc.so.6
#2  0x009380cb in ExceptionalCondition ()
#3  0x005aadbc in SubTransGetTopmostTransaction ()
#4  0x009741ef in XidInMVCCSnapshot ()
#5  0x00563326 in HeapTupleSatisfiesVisibility ()
#6  0x00559d50 in heapgetpage ()
#7  0x0055a0e9 in heapgettup_pagemode ()
#8  0x0055b98e in heap_getnextslot ()
#9  0x006df713 in SeqNext ()
#10 0x006b725f in ExecScan ()
#11 0x006bef41 in fetch_input_tuple ()
#12 0x006c1f6f in ExecAgg ()
#13 0x006ade83 in standard_ExecutorRun ()
#14 0x006b21ed in ParallelQueryMain ()
#15 0x005a8c11 in ParallelWorkerMain ()
#16 0x00795f3c in StartBackgroundWorker ()
#17 0x007a1bc1 in maybe_start_bgworkers ()
#18 0x007a25c5 in sigusr1_handler ()
#19 
#20 0x7f1542dce25b in select () from /lib64/libc.so.6
#21 0x007a2cf6 in ServerLoop ()
#22 0x007a4659 in PostmasterMain ()
#23 0x005042ed in main ()

(Using a debug-build, it seems much harder to reproduce on my system)

With my patch applied, I've had no crashes, from many runs.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-24 Thread Pavel Borisov
пн, 24 мая 2021 г. в 09:22, Greg Nancarrow :

> On Mon, May 24, 2021 at 2:50 PM Michael Paquier 
> wrote:
> >
> > On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > > Keep cfbot happy, use the PG14 patch as latest.
> >
> > This stuff is usually very tricky.
>
> Agreed. That's why I was looking for experts in this snapshot-handling
> code, to look closer at this issue, check my proposed fix, come up
> with a better solution etc.
>
> >Do we have a way to reliably
> > reproduce the report discussed here?
>

Using a recipe similar to what has been described above in the thread, I
reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
1. Make & make install
2. Make check
3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code
dir.

The test fails with coredumps in around 10 minutes. With applied fix has
never failed yet.
(Though transaction snapshots kitchen is indeed tricky and I am not 100%
sure the fix does right thing which is safe in all circumstances)
PGROOT=`pwd`/tmp_install
PGDB=`pwd`/tmpdb
PGBIN=$PGROOT/usr/local/pgsql/bin

export PATH="$PGBIN:$PATH"
export LD_LIBRARY_PATH="$PGROOT/usr/local/pgsql/lib"
rm -rf "$PGDB"; $PGBIN/initdb -D "$PGDB"

#echo -e "session_pool_size=2" >> $PGDB/postgresql.auto.conf
echo "
#  fsync=off
max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

" >> $PGDB/postgresql.auto.conf

$PGBIN/pg_ctl -w -t 5 -D "$PGDB" -l server.log start
$PGBIN/createdb test
export PGDATABASE=test
$PGBIN/psql -f init_test.sql
$PGBIN/psql -t -c "SELECT version()"
# $PGBIN/pgbench -n -r -f .../sub_120.sql -c 200 -j 200 -T 120
$PGBIN/pgbench -n -r -f sub_120.sql -c 25 -j 25 -T 1800
$PGBIN/pg_ctl -w -t 5 -D "$PGDB" stop
sleep 2
coredumpctl


init_test.sql
Description: Binary data


sub_120.sql
Description: Binary data


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Greg Nancarrow
On Mon, May 24, 2021 at 2:50 PM Michael Paquier  wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

I couldn't reproduce it in my environment (though I could understand
what was going wrong, based on the description provided).
houzj (houzj.f...@fujitsu.com) was able to reproduce it in his
environment and kindly provided to me the following information:
(He said that he followed most of the steps described by the original
problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little
different from his steps. See the emails higher in the thread for the
two scripts "init_test.sql" and "sub_120.sql")

===

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.( ./configure
--enable-cassert --prefix=/home/pgsql)
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the
coredump from parallel scan, so we adjust parallel setting, make it
easy to reproduce.

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try
it sometimes, you should get the coredump file.
pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 12000
   (If cannot reproduce it, maybe you can try run two parallel pgbench
xx at the same time)

In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz),
sometimes I can reproduce in about 5 minutes , but sometimes it needs
about half an hour.

Best regards,
houzj

===

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Michael Paquier
On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> Keep cfbot happy, use the PG14 patch as latest.

This stuff is usually very tricky.  Do we have a way to reliably
reproduce the report discussed here?
--
Michael


signature.asc
Description: PGP signature


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Greg Nancarrow
On Thu, May 20, 2021 at 4:08 PM Greg Nancarrow  wrote:
>

Keep cfbot happy, use the PG14 patch as latest.

Regards,
Greg Nancarrow
Fujitsu Australia


v2-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-20 Thread Greg Nancarrow
On Thu, May 20, 2021 at 11:18 AM Pengchengliu  wrote:
>
> Hi Greg,
>Thanks a lot for you explanation and your fix.
>
>I think your fix can resolve the core dump issue. As with your fix, 
> parallel process reset Transaction Xmin from ActiveSnapshot.
>But it will change Transaction snapshot for all parallel  scenarios. I 
> don't know whether it bring in other issue.
>For test only, I think it is enough.
>
>So is there anybody can explain what's exactly difference between 
> ActiveSnapshot and TransactionSnapshot in parallel work process.
>Then maybe we can find a better solution and try to fix it really.
>

I am proposing the attached patch to fix this issue (patches for both
PG13.2 and latest PG14 source are provided).

Perhaps this will trigger others who better know the intricacies of
snapshot handling, and know the reasons and history behind why the
transaction_snapshot and active_snapshot are currently passed
separately to parallel workers, to speak up here and discuss the issue
further, and check my fix (and note, I haven't attempted to modify
README.parallel in the patch until I get further clarification on this
issue).
Perhaps someone can explain the purpose of calling
GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is
this consistent with the current ActiveSnapshot?
AFAICS, that doesn't seem correct, and that's why the patch removes it.

I've rebuilt Postgres with the patch applied and run the regression
tests, with and without "force_parallel_mode=regress", and all tests
pass.
So if the fix is wrong, no tests currently exist that detect issues with it.

Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


v1-0001-PG13_2-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-19 Thread Pengchengliu
Hi Greg,
   Thanks a lot for you explanation and your fix.
   
   I think your fix can resolve the core dump issue. As with your fix, parallel 
process reset Transaction Xmin from ActiveSnapshot. 
   But it will change Transaction snapshot for all parallel  scenarios. I don't 
know whether it bring in other issue. 
   For test only, I think it is enough. 

   So is there anybody can explain what's exactly difference between 
ActiveSnapshot and TransactionSnapshot in parallel work process. 
   Then maybe we can find a better solution and try to fix it really.

Thanks
Pengcheng 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月18日 17:15
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
coredump

On Tue, May 18, 2021 at 11:27 AM Pengchengliu  wrote:
>
> Hi Greg,
>
>Actually I am very confused about ActiveSnapshot and TransactionSnapshot. 
> I don't know why main process send ActiveSnapshot and TransactionSnapshot 
> separately.  And what is exact difference between them?
>If you know that, could you explain that for me? It will be very 
> appreciated.

In the context of a parallel-worker, I am a little confused too, so I can't 
explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the following as 
part of the state that needs to be copied to each worker:

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to plan 
execution, GetTransactionSnapshot() was called and its return value was stored 
in both the QueryDesc and the estate (es_snapshot), which was then pushed on 
the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the last 
snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is not 
clear to me (because isn't then the ActiveSnapshot a potentially earlier 
snapshot? - which it shouldn't be, AFAIK. And also, it's then different to the 
non-parallel case).

>Before we know them exactly, I think we should not modify the 
> TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
> process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have a 
development environment, which I assume you have). Sure, IF it was deemed a 
proper solution, you'd only send the one snapshot, and adjust accordingly in 
ParallelWorkerMain(), but we need not worry about that in order to test it.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread Greg Nancarrow
On Tue, May 18, 2021 at 9:41 PM houzj.f...@fujitsu.com
 wrote:
>
> > To: Pengchengliu 
> > Cc: Greg Nancarrow ; Andres Freund 
> > ; PostgreSQL-development 
> > Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
> > coredump
>
> > I've also seen the reports of the same 
> > Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a 
> > subsequent crash in a parallel worker in PostgreSQL v11-based
> > build, Though I was unable to investigate deeper and reproduce the issue. 
> > The details above in the thread make me think it is a real and 
> > long-time-persistent error that is
> > surely worth to be fixed.
>
> I followed Liu's reproduce steps and successfully reproduce it in about half 
> an hour running.
> My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".
>
> After applying greg-san's change, the coredump did not happened in two 
> hour(it is still running).
> Note, I have not taken a deep look into the change, just provide some test 
> information in advance.
>

+1
Thanks for doing that.
I'm unsure if that "fix" is the right approach, so please investigate it too.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread houzj.f...@fujitsu.com
> To: Pengchengliu 
> Cc: Greg Nancarrow ; Andres Freund ; 
> PostgreSQL-development 
> Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
> coredump

> I've also seen the reports of the same 
> Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent 
> crash in a parallel worker in PostgreSQL v11-based 
> build, Though I was unable to investigate deeper and reproduce the issue. The 
> details above in the thread make me think it is a real and 
> long-time-persistent error that is 
> surely worth to be fixed.

I followed Liu's reproduce steps and successfully reproduce it in about half an 
hour running.
My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".

After applying greg-san's change, the coredump did not happened in two hour(it 
is still running).
Note, I have not taken a deep look into the change, just provide some test 
information in advance.

Best regards,
houzj


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread Greg Nancarrow
On Tue, May 18, 2021 at 11:27 AM Pengchengliu  wrote:
>
> Hi Greg,
>
>Actually I am very confused about ActiveSnapshot and TransactionSnapshot. 
> I don't know why main process send ActiveSnapshot and TransactionSnapshot 
> separately.  And what is exact difference between them?
>If you know that, could you explain that for me? It will be very 
> appreciated.

In the context of a parallel-worker, I am a little confused too, so I
can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the
following as part of the state that needs to be copied to each worker:

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to
plan execution, GetTransactionSnapshot() was called and its return
value was stored in both the QueryDesc and the estate (es_snapshot),
which was then pushed on the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the
last snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is
not clear to me (because isn't then the ActiveSnapshot a potentially
earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then
different to the non-parallel case).

>Before we know them exactly, I think we should not modify the 
> TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
> process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have
a development environment, which I assume you have). Sure, IF it was
deemed a proper solution, you'd only send the one snapshot, and adjust
accordingly in ParallelWorkerMain(), but we need not worry about that
in order to test it.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread Pavel Borisov
I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid,
TransactionXmin)) with a subsequent crash in a parallel worker in
PostgreSQL v11-based build, Though I was unable to investigate deeper and
reproduce the issue. The details above in the thread make me think it is a
real and long-time-persistent error that is surely worth to be fixed.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-17 Thread Pengchengliu
Hi Greg,
  > I actually think that the Assert in SubTransGetTopmostTransaction() is 
correct, but in the parallel-worker case, the snapshots are not being setup 
correctly.
  
   I agree with you that Assert in SubTransGetTopmostTransaction() is correct.  
The root cause is Transaction Xmin are not being setup correctly in the 
parallel-worker.
   
   Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I 
don't know why main process send ActiveSnapshot and TransactionSnapshot 
separately.  And what is exact difference between them?
   If you know that, could you explain that for me? It will be very 
appreciated. 
   Before we know them exactly, I think we should not modify the 
TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
process should send ActiveSnapshot only.  

Thanks
Pengcheng
   
   

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月17日 20:59
To: 刘鹏程 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
coredump

On Sat, May 15, 2021 at 12:37 PM 刘鹏程  wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be 
> reproduced. My colleague and me use different environment both can reproduce 
> it.
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can certainly see 
how, for the snapshots used in the parallel worker case, the Active snapshot 
used is potentially an earlier snapshot that the Transaction snapshot. I don't 
know why it is getting a newer Transaction snapshot in InitializeParallelDSM(), 
when it has previously pushed the return value of GetTransactionSnapshot() as 
the Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how to 
resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is correct, 
but in the parallel-worker case, the snapshots are not being setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-17 Thread Greg Nancarrow
On Sat, May 15, 2021 at 12:37 PM 刘鹏程  wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be 
> reproduced. My colleague and me use different environment both can reproduce 
> it.
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can
certainly see how, for the snapshots used in the parallel worker case,
the Active snapshot used is potentially an earlier snapshot that the
Transaction snapshot. I don't know why it is getting a newer
Transaction snapshot in InitializeParallelDSM(), when it has
previously pushed the return value of GetTransactionSnapshot() as the
Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how
to resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is
correct, but in the parallel-worker case, the snapshots are not being
setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */




RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-17 Thread Pengchengliu
Hi Tom & Robert,
  Could you review this Assert(TransactionIdFollowsOrEquals(xid, 
TransactionXmin)) in SubTransGetTopmostTransaction. 
  I think this assert is unsuitable for parallel work process.

 Before we discuss it in 
  
https://www.postgresql-archive.org/Parallel-scan-with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html
 
Thanks
Pengcheng

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月15日 0:44
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 8:36 PM Pengchengliu  wrote:
>   Did you use pgbench with the script sub_120.sql which I provide in 
> attachment?

yes

>
>   Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change 
> any codes, now we just use the origin codes in PG13.2.
>

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is avoided by 
increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it probably indicates 
the overflow condition is affecting the xmin.xmax of the two snapshots such 
that it invalidates the condition that is asserted.


I think one problem is that in your settings, you haven't set 
"max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using parallel 
workers, as the workers can't actually be launched (and this is probably why 
I'm finding it hard to reproduce the issue, if the problem involves snapshot 
suboverflow and parallel workers).
I find that the following settings improve the parallelism per query and the 
whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line:   pgbench  -d postgres -p 33550
-n -r -f sub_120.sql   -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens if you 
use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line:   pgbench  -d postgres -p 33550  -n -r
-f sub_120.sql   -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-14 Thread Greg Nancarrow
On Fri, May 14, 2021 at 8:36 PM Pengchengliu  wrote:
>   Did you use pgbench with the script sub_120.sql which I provide in 
> attachment?

yes

>
>   Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change 
> any codes, now we just use the origin codes in PG13.2.
>

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is
avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it
probably indicates the overflow condition is affecting the xmin.xmax
of the two snapshots such that it invalidates the condition that is
asserted.


I think one problem is that in your settings, you haven't set
"max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using
parallel workers, as the workers can't actually be launched (and this
is probably why I'm finding it hard to reproduce the issue, if the
problem involves snapshot suboverflow and parallel workers).
I find that the following settings improve the parallelism per query
and the whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line:   pgbench  -d postgres -p 33550
-n -r -f sub_120.sql   -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens
if you use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line:   pgbench  -d postgres -p 33550  -n -r
-f sub_120.sql   -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-14 Thread Pengchengliu
Hi Greg,

  

  When you get the different xmin between active snapshot and transaction 
snapshot, maybe there is no coredump. 

  As maybe there is not tupe(xmin) between ActiveSnapshot->xmin and 
TransactionSnapshot->xmin which needs to be scaned in parallel process.

  

  There is no doubt, it is very likely that ActiveSnapshot->xmin precedes 
TransactionSnapshot->xmin.

  

  For this coredump, we must make sure parallel and snapshot overflow. If 
snapshot is not overflow, you cannot get the coredump. 

  As coredump is from parallel scan in MVCC when snapshot is overflow.

  

  Did you use pgbench with the script sub_120.sql which I provide in 
attachment? 

  As the default PGPROC_MAX_CACHED_SUBXIDS is 64. In script sub_120.sql, for 
one transaction, it will use 120 subtransactions which is much larger than 64.

  While getting the snapshot, it must be overflow. I really don't know why your 
snapshot is not overflow.

  Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change 
any codes, now we just use the origin codes in PG13.2.

  I have checked the codes in master branch, there is no change about this 
mechanism. This issue should still exist. 

  

  

  Thanks

  Pengcheng

 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月14日 16:47
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Fri, May 14, 2021 at 12:25 PM Pengchengliu < 
<mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote:

> 

> Hi Greg,

> 

>   Thanks for you replay and test.

> 

> 

> 

>   When main process gets the transaction snapshot in 
> InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin 
> is very likely follows active snapshot xmin.

> 

> 

> 

>   Use gdb it is easy to verify it.

> 

>   Create env as blow:

> 

> 

> 

>   1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster 
> database.

> 

>   2, Append the postgres.conf as below:

> 

> 

> 

>   max_connections = 2000

> 

>   parallel_setup_cost=0

> 

>   parallel_tuple_cost=0

> 

>   min_parallel_table_scan_size=0

> 

>   max_parallel_workers_per_gather=8

> 

>   max_parallel_workers = 128

> 

> 

> 

>   3, Start the cluster database. Use the init_test.sql script in attachment 
> to create some test tables.

> 

>   4, Use the sub_120.sql script in attachment with pgbench to test it.

> 

> 

> 

>   pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800

> 

> 

> 

> 

> 

> 

> 

>   Then you can login the database, and use GDB to verify it.

> 

>   1, First use explain, make sure force Parallel is OK.

> 

> 

> 

>   postgres=# explain (verbose,analyze) select count(*) from contend1;

> 

>  

> QUERY PLAN

> 

> 

> 

> --

> ---

> 

> 

> 

> Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual 

> time=1075.214..1075.449 rows=1 loops=1)

> 

>Output: count(*)

> 

>->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual 

> time=1075.198..1075.433 rows=1 loops=1)

> 

>  Output: (PARTIAL count(*))

> 

>  Workers Planned: 8

> 

>  Workers Launched: 0

> 

>  ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 

> width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

> 

>Output: PARTIAL count(*)

> 

>->  Parallel Seq Scan on public.contend1  

> (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 

> rows=1

> 

> 010200 loops=1)

> 

>  Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, 

> c10, crt_time

> 

> Planning Time: 0.123 ms

> 

> Execution Time: 1075.588 ms

> 

>  postgres=# select pg_backend_pid();

> 

> pg_backend_pid

> 

> 

> 

> 2182678

> 

> 

> 

> 

> 

>   2, use gdb to debug our backend process. Add the breakpoint in 
> parallel.c:219 and continue.

> 

> 

> 

>   gdb  -q -p 2182678

> 

>   ...

> 

>   (gdb) b parallel.c:219

> 

> Breakpoint 1 at 0x55d085: file 
> /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c,
>  line 219.

> 

>   (gdb) c

> 

> Continuing.

> 

> 

> 

>   3, In the psql clinet, we can execute the explain command in step1 aga

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-14 Thread Greg Nancarrow
On Fri, May 14, 2021 at 12:25 PM Pengchengliu  wrote:
>
> Hi Greg,
>
>   Thanks for you replay and test.
>
>
>
>   When main process gets the transaction snapshot in 
> InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin 
> is very likely follows active snapshot xmin.
>
>
>
>   Use gdb it is easy to verify it.
>
>   Create env as blow:
>
>
>
>   1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster 
> database.
>
>   2, Append the postgres.conf as below:
>
>
>
>   max_connections = 2000
>
>   parallel_setup_cost=0
>
>   parallel_tuple_cost=0
>
>   min_parallel_table_scan_size=0
>
>   max_parallel_workers_per_gather=8
>
>   max_parallel_workers = 128
>
>
>
>   3, Start the cluster database. Use the init_test.sql script in attachment 
> to create some test tables.
>
>   4, Use the sub_120.sql script in attachment with pgbench to test it.
>
>
>
>   pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800
>
>
>
>
>
>
>
>   Then you can login the database, and use GDB to verify it.
>
>   1, First use explain, make sure force Parallel is OK.
>
>
>
>   postgres=# explain (verbose,analyze) select count(*) from contend1;
>
>  QUERY 
> PLAN
>
>
>
> -
>
> 
>
> Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual 
> time=1075.214..1075.449 rows=1 loops=1)
>
>Output: count(*)
>
>->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual 
> time=1075.198..1075.433 rows=1 loops=1)
>
>  Output: (PARTIAL count(*))
>
>  Workers Planned: 8
>
>  Workers Launched: 0
>
>  ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 width=8) 
> (actual time=1074.674..1074.676 rows=1 loops=1)
>
>Output: PARTIAL count(*)
>
>->  Parallel Seq Scan on public.contend1  (cost=0.00..11690.06 
> rows=126406 width=0) (actual time=0.008..587.454 rows=1
>
> 010200 loops=1)
>
>  Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, 
> crt_time
>
> Planning Time: 0.123 ms
>
> Execution Time: 1075.588 ms
>
>  postgres=# select pg_backend_pid();
>
> pg_backend_pid
>
> 
>
> 2182678
>
>
>
>
>
>   2, use gdb to debug our backend process. Add the breakpoint in 
> parallel.c:219 and continue.
>
>
>
>   gdb  -q -p 2182678
>
>   ...
>
>   (gdb) b parallel.c:219
>
> Breakpoint 1 at 0x55d085: file 
> /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c,
>  line 219.
>
>   (gdb) c
>
> Continuing.
>
>
>
>   3, In the psql clinet, we can execute the explain command in step1 again.
>
>  After we get the breakpoint in gdb, we wait a moment. Then we execute 
> next.
>
>  Use gdb check active_snapshot and transaction_snapshot, 
> active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.
>
> When I use gdb test it, sometimes active_snapshot is the same as 
> transaction_snapshot. Then you can try it multiple times, and before execute 
> next, try wait longer time.
>
>
>
>Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)
>
> at 
> /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219
>
> 219 Snapshottransaction_snapshot = 
> GetTransactionSnapshot();
>
> (gdb) n
>
> 220 Snapshotactive_snapshot = GetActiveSnapshot();
>
> (gdb)
>
> 223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);
>
> (gdb) p *transaction_snapshot
>
> $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 
> 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,
>
>   subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = 
> false, curcid = 0, speculativeToken = 0,
>
>   active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, 
> next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
>
> (gdb) p *active_snapshot
>
> $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 
> 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,
>
>   suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 
> 0, speculativeToken = 0, active_count = 1,
>
>   regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, 
> prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}
>
> (gdb)
>
>

Hi Pengcheng,

I followed all your steps.
However, I perhaps get different behavior in my environment.
99% of the time, the xmin and xmax of the active_snapshot and
transaction_snapshot are the same (regardless of how long I wait at
different points after the breakpoint is hit). I've had one or two
instances where the xmax values differ. I managed to catch just one
case where there were different xmin and xmax values in the snapshots,
but this occurred just prior to 

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-13 Thread Pengchengliu
Hi Greg,

  Thanks for you replay and test.

  

  When main process gets the transaction snapshot in 
InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin 
is very likely follows active snapshot xmin.

  

  Use gdb it is easy to verify it.

  Create env as blow:

  

  1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster 
database.

  2, Append the postgres.conf as below:

  

  max_connections = 2000

  parallel_setup_cost=0

  parallel_tuple_cost=0

  min_parallel_table_scan_size=0

  max_parallel_workers_per_gather=8

  max_parallel_workers = 128 

  

  3, Start the cluster database. Use the init_test.sql script in attachment to 
create some test tables.

  4, Use the sub_120.sql script in attachment with pgbench to test it.

  

  pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800

 

 

 

  Then you can login the database, and use GDB to verify it. 

  1, First use explain, make sure force Parallel is OK.

 

  postgres=# explain (verbose,analyze) select count(*) from contend1;

 QUERY PLAN 
 



-



Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual 
time=1075.214..1075.449 rows=1 loops=1)

   Output: count(*)

   ->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual 
time=1075.198..1075.433 rows=1 loops=1)

 Output: (PARTIAL count(*))

 Workers Planned: 8

 Workers Launched: 0

 ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 width=8) 
(actual time=1074.674..1074.676 rows=1 loops=1)

   Output: PARTIAL count(*)

   ->  Parallel Seq Scan on public.contend1  (cost=0.00..11690.06 
rows=126406 width=0) (actual time=0.008..587.454 rows=1

010200 loops=1)

 Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, 
crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

 postgres=# select pg_backend_pid();

pg_backend_pid 



2182678



 

  2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 
and continue.

  

  gdb  -q -p 2182678 

  ...

  (gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file 
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c,
 line 219.

  (gdb) c

Continuing.

 

  3, In the psql clinet, we can execute the explain command in step1 again. 

 After we get the breakpoint in gdb, we wait a moment. Then we execute next.

 Use gdb check active_snapshot and transaction_snapshot, 
active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as 
transaction_snapshot. Then you can try it multiple times, and before execute 
next, try wait longer time.

   

   Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

at 
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219

219 Snapshottransaction_snapshot = GetTransactionSnapshot();

(gdb) n 

220 Snapshotactive_snapshot = GetActiveSnapshot();

(gdb) 

223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 
0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010, 

  subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = 
false, curcid = 0, speculativeToken = 0, 

  active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling 
= 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 
0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0, 

  suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, 
speculativeToken = 0, active_count = 1, 

  regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, 
prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb)  

 

Thanks

Pengcheng

  

  

 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月13日 22:15
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Thu, May 13, 2021 at 11:25 AM Pengchengliu < 
<mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote:

> 

 

> Hi Andres,

>  Thanks for you replay.

 

Er, it's Greg who has replied so far (not Andres).

 

> 

>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench 
> longer time, such as 30 or 60 minutes.

> 

 

Actually, I d

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-13 Thread Greg Nancarrow
On Thu, May 13, 2021 at 11:25 AM Pengchengliu  wrote:
>

> Hi Andres,
>  Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

>
>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench 
> longer time, such as 30 or 60 minutes.
>

Actually, I did run it, multiple times, for more than 60 minutes, but
no assert/crash/coredump occurred in my environment.

>
>   The parallel work process:
>
>   1, Get Snapshot and set TransactionXmin itself, in 
> ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.
>
>   2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from 
> main process, and set TransactionXmin 799425 in 
> ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.
>
>   3, 
> ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel
>  get the active snapshot(xmin 799162, xmax  82206) from main process, and set 
> this snapshot to scan->rs_base.rs_snapshot.
>
>   4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and 
> TransactionXmin(799425),when scan tuple(xmin 799225) 
> SubTransGetTopmostTransaction assert got.
>
>  In  
> HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.
>

I added some logging at a couple of points in the code:
1) In the Worker process code - ParallelWorkerMain() - where it
restores the serialized transaction and active snapshots (i.e. passed
to the Worker from the main process).
2) In the HeapTupleSatisfiesMVCC() function, immediately before it
calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in
ALL cases, the transaction snapshot xmin,xmax was always THE SAME as
the active snapshot xmin,xmax.
(Can you verify that this occurs on your system when things are
working, prior to the coredump?)

This is different to what you are getting in your environment (at
least, different to what you described when the problem occurs).
In your case, you say that the main process gets "the newer
transaction snapshot" - where exactly is this happening in your case?
(or this is what you don't yet know?)
Perhaps very occasionally this somehow happens on your system and
triggers the Assert (and coredump)?  I have not been able to reproduce
that on my system.

Have you reproduced this issue on any other system, using the same
steps as you provided?
I'm wondering if there might be something else in your environment
that may be influencing this problem.


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-12 Thread Pengchengliu
nSnapshot->GetSnapshotData.

  2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from 
main process, and set TransactionXmin 799425 in 
ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

  3, 
ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel
 get the active snapshot(xmin 799162, xmax  82206) from main process, and set 
this snapshot to scan->rs_base.rs_snapshot.

  4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and 
TransactionXmin(799425),when scan tuple(xmin 799225) 
SubTransGetTopmostTransaction assert got. 

 In  
HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

  

  As main process gets the active snapshot (xmin 799162, xmax 822061) which is 
earlier than transaction snapshot(xmin 799425, xmax 822293). Parallel work 
process set TransactionXmin with transaction snapshot(xmin 799425, xmax 822293).

  But scan tuple with active snapshot (xmin 799162, xmax 822061).

Thanks

Pengcheng

  

  

  

 

 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月11日 19:08
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Tue, May 11, 2021 at 11:28 AM Pengchengliu < 
<mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote:

> 

> Hi Andres,

>Reproduce steps.

> 

> 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  
> "src/include/access/subtrans.h" line number 15.

> 2, configure with enable assert and build it.

> 3, init a new database cluster.

> 4, modify  postgres.conf  and add some parameters as below. As the coredump 
> from parallel scan, so we adjust parallel setting, make it easy to reproduce.

> 

>   max_connections = 2000

> 

>   parallel_setup_cost=0

>   parallel_tuple_cost=0

>   min_parallel_table_scan_size=0

>   max_parallel_workers_per_gather=8

>   max_parallel_workers = 32

> 

> 5, start the database cluster.

> 6, use the script init_test.sql  in attachment to create tables.

> 7, use pgbench with script sub_120.sql in attachment to test it. Try it 
> sometimes, you should get the coredump file.

> pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

> 

 

Hi,

 

I had a go at reproducing your reported issue, making sure to follow all your 
steps.

Unfortunately, your script seemed to run OK with pgbench and no crash/coredump 
occurred for me (and yes, I definitely had asserts enabled).

I tried with both the 13.2 source code

(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script with 
pgbench twice to completion, and also did the same using the latest Postgres 
source code.

 

Will be interesting to see if anyone is able to reproduce your issue.

 

Regards,

Greg Nancarrow

Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-11 Thread Greg Nancarrow
On Tue, May 11, 2021 at 11:28 AM Pengchengliu  wrote:
>
> Hi Andres,
>Reproduce steps.
>
> 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  
> "src/include/access/subtrans.h" line number 15.
> 2, configure with enable assert and build it.
> 3, init a new database cluster.
> 4, modify  postgres.conf  and add some parameters as below. As the coredump 
> from parallel scan, so we adjust parallel setting, make it easy to reproduce.
>
>   max_connections = 2000
>
>   parallel_setup_cost=0
>   parallel_tuple_cost=0
>   min_parallel_table_scan_size=0
>   max_parallel_workers_per_gather=8
>   max_parallel_workers = 32
>
> 5, start the database cluster.
> 6, use the script init_test.sql  in attachment to create tables.
> 7, use pgbench with script sub_120.sql in attachment to test it. Try it 
> sometimes, you should get the coredump file.
> pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120
>

Hi,

I had a go at reproducing your reported issue, making sure to follow
all your steps.
Unfortunately, your script seemed to run OK with pgbench and no
crash/coredump occurred for me (and yes, I definitely had asserts
enabled).
I tried with both the 13.2 source code
(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script
with pgbench twice to completion, and also did the same using the
latest Postgres source code.

Will be interesting to see if anyone is able to reproduce your issue.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-10 Thread Pengchengliu
Hi Andres,
   Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the coredump 
from parallel scan, so we adjust parallel setting, make it easy to reproduce. 

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables. 
7, use pgbench with script sub_120.sql in attachment to test it. Try it 
sometimes, you should get the coredump file.
pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

Thanks
Pengcheng


-Original Message-
From: Andres Freund  
Sent: 2021年5月7日 11:55
To: Pengchengliu 
Cc: pgsql-hack...@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
> 
> Last email, format error, missing some information, so I resend this email.  
> 
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested 
> subtransaction with parallel scan, I got a subtransaction coredump as below:


> So the root cause is the Parallel Workers process set the TransactionXmin 
> with later transcation snapshot. When parallel scan, Parallel Workers process 
> use the older active snapshot.
> 
> It leads to subtrans assert coredump. I don't know how to fix it. Is there 
> any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund


test_script.tar
Description: Unix tar archive


RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-10 Thread Pengchengliu
Hi Andres,
   Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the coredump 
from parallel scan, so we adjust parallel setting, make it easy to reproduce. 

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables. 
7, use pgbench with script sub_120.sql in attachment to test it. Try it 
sometimes, you should get the coredump file.
pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

Thanks
Pengcheng


-Original Message-
From: Andres Freund  
Sent: 2021年5月7日 11:55
To: Pengchengliu 
Cc: pgsql-hack...@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
> 
> Last email, format error, missing some information, so I resend this email.  
> 
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested 
> subtransaction with parallel scan, I got a subtransaction coredump as below:


> So the root cause is the Parallel Workers process set the TransactionXmin 
> with later transcation snapshot. When parallel scan, Parallel Workers process 
> use the older active snapshot.
> 
> It leads to subtrans assert coredump. I don't know how to fix it. Is there 
> any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-06 Thread Andres Freund
Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
> 
> Last email, format error, missing some information, so I resend this email.  
> 
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested 
> subtransaction with parallel scan, I got a subtransaction coredump as below:


> So the root cause is the Parallel Workers process set the TransactionXmin 
> with later transcation snapshot. When parallel scan, Parallel Workers process 
> use the older active snapshot.
> 
> It leads to subtrans assert coredump. I don't know how to fix it. Is there 
> any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund