Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-24 Thread Greg Nancarrow
On Mon, May 24, 2021 at 11:56 PM Pavel Borisov  wrote:
>
> Using a recipe similar to what has been described above in the thread, I 
> reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
> 1. Make & make install
> 2. Make check
> 3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code dir.
>

Thanks, using the latest PG14 code
(713a431c781fbfe1a22fae4991836077f0f4c513), with no modifications, and
building with "--enable-cassert", your script reproduced the coredump
on my system in about 10 minutes, on the 1st run.

#0  0x7f1542d1137f in raise () from /lib64/libc.so.6
#1  0x7f1542cfbdb5 in abort () from /lib64/libc.so.6
#2  0x009380cb in ExceptionalCondition ()
#3  0x005aadbc in SubTransGetTopmostTransaction ()
#4  0x009741ef in XidInMVCCSnapshot ()
#5  0x00563326 in HeapTupleSatisfiesVisibility ()
#6  0x00559d50 in heapgetpage ()
#7  0x0055a0e9 in heapgettup_pagemode ()
#8  0x0055b98e in heap_getnextslot ()
#9  0x006df713 in SeqNext ()
#10 0x006b725f in ExecScan ()
#11 0x006bef41 in fetch_input_tuple ()
#12 0x006c1f6f in ExecAgg ()
#13 0x006ade83 in standard_ExecutorRun ()
#14 0x006b21ed in ParallelQueryMain ()
#15 0x005a8c11 in ParallelWorkerMain ()
#16 0x00795f3c in StartBackgroundWorker ()
#17 0x007a1bc1 in maybe_start_bgworkers ()
#18 0x007a25c5 in sigusr1_handler ()
#19 
#20 0x7f1542dce25b in select () from /lib64/libc.so.6
#21 0x007a2cf6 in ServerLoop ()
#22 0x007a4659 in PostmasterMain ()
#23 0x005042ed in main ()

(Using a debug-build, it seems much harder to reproduce on my system)

With my patch applied, I've had no crashes, from many runs.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-24 Thread Pavel Borisov
пн, 24 мая 2021 г. в 09:22, Greg Nancarrow :

> On Mon, May 24, 2021 at 2:50 PM Michael Paquier 
> wrote:
> >
> > On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > > Keep cfbot happy, use the PG14 patch as latest.
> >
> > This stuff is usually very tricky.
>
> Agreed. That's why I was looking for experts in this snapshot-handling
> code, to look closer at this issue, check my proposed fix, come up
> with a better solution etc.
>
> >Do we have a way to reliably
> > reproduce the report discussed here?
>

Using a recipe similar to what has been described above in the thread, I
reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
1. Make & make install
2. Make check
3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code
dir.

The test fails with coredumps in around 10 minutes. With applied fix has
never failed yet.
(Though transaction snapshots kitchen is indeed tricky and I am not 100%
sure the fix does right thing which is safe in all circumstances)
PGROOT=`pwd`/tmp_install
PGDB=`pwd`/tmpdb
PGBIN=$PGROOT/usr/local/pgsql/bin

export PATH="$PGBIN:$PATH"
export LD_LIBRARY_PATH="$PGROOT/usr/local/pgsql/lib"
rm -rf "$PGDB"; $PGBIN/initdb -D "$PGDB"

#echo -e "session_pool_size=2" >> $PGDB/postgresql.auto.conf
echo "
#  fsync=off
max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

" >> $PGDB/postgresql.auto.conf

$PGBIN/pg_ctl -w -t 5 -D "$PGDB" -l server.log start
$PGBIN/createdb test
export PGDATABASE=test
$PGBIN/psql -f init_test.sql
$PGBIN/psql -t -c "SELECT version()"
# $PGBIN/pgbench -n -r -f .../sub_120.sql -c 200 -j 200 -T 120
$PGBIN/pgbench -n -r -f sub_120.sql -c 25 -j 25 -T 1800
$PGBIN/pg_ctl -w -t 5 -D "$PGDB" stop
sleep 2
coredumpctl


init_test.sql
Description: Binary data


sub_120.sql
Description: Binary data


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Greg Nancarrow
On Mon, May 24, 2021 at 2:50 PM Michael Paquier  wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

I couldn't reproduce it in my environment (though I could understand
what was going wrong, based on the description provided).
houzj (houzj.f...@fujitsu.com) was able to reproduce it in his
environment and kindly provided to me the following information:
(He said that he followed most of the steps described by the original
problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little
different from his steps. See the emails higher in the thread for the
two scripts "init_test.sql" and "sub_120.sql")

===

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.( ./configure
--enable-cassert --prefix=/home/pgsql)
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the
coredump from parallel scan, so we adjust parallel setting, make it
easy to reproduce.

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try
it sometimes, you should get the coredump file.
pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 12000
   (If cannot reproduce it, maybe you can try run two parallel pgbench
xx at the same time)

In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz),
sometimes I can reproduce in about 5 minutes , but sometimes it needs
about half an hour.

Best regards,
houzj

===

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Michael Paquier
On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> Keep cfbot happy, use the PG14 patch as latest.

This stuff is usually very tricky.  Do we have a way to reliably
reproduce the report discussed here?
--
Michael


signature.asc
Description: PGP signature


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-23 Thread Greg Nancarrow
On Thu, May 20, 2021 at 4:08 PM Greg Nancarrow  wrote:
>

Keep cfbot happy, use the PG14 patch as latest.

Regards,
Greg Nancarrow
Fujitsu Australia


v2-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-20 Thread Greg Nancarrow
On Thu, May 20, 2021 at 11:18 AM Pengchengliu  wrote:
>
> Hi Greg,
>Thanks a lot for you explanation and your fix.
>
>I think your fix can resolve the core dump issue. As with your fix, 
> parallel process reset Transaction Xmin from ActiveSnapshot.
>But it will change Transaction snapshot for all parallel  scenarios. I 
> don't know whether it bring in other issue.
>For test only, I think it is enough.
>
>So is there anybody can explain what's exactly difference between 
> ActiveSnapshot and TransactionSnapshot in parallel work process.
>Then maybe we can find a better solution and try to fix it really.
>

I am proposing the attached patch to fix this issue (patches for both
PG13.2 and latest PG14 source are provided).

Perhaps this will trigger others who better know the intricacies of
snapshot handling, and know the reasons and history behind why the
transaction_snapshot and active_snapshot are currently passed
separately to parallel workers, to speak up here and discuss the issue
further, and check my fix (and note, I haven't attempted to modify
README.parallel in the patch until I get further clarification on this
issue).
Perhaps someone can explain the purpose of calling
GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is
this consistent with the current ActiveSnapshot?
AFAICS, that doesn't seem correct, and that's why the patch removes it.

I've rebuilt Postgres with the patch applied and run the regression
tests, with and without "force_parallel_mode=regress", and all tests
pass.
So if the fix is wrong, no tests currently exist that detect issues with it.

Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


v1-0001-PG13_2-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data


RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-19 Thread Pengchengliu
Hi Greg,
   Thanks a lot for you explanation and your fix.
   
   I think your fix can resolve the core dump issue. As with your fix, parallel 
process reset Transaction Xmin from ActiveSnapshot. 
   But it will change Transaction snapshot for all parallel  scenarios. I don't 
know whether it bring in other issue. 
   For test only, I think it is enough. 

   So is there anybody can explain what's exactly difference between 
ActiveSnapshot and TransactionSnapshot in parallel work process. 
   Then maybe we can find a better solution and try to fix it really.

Thanks
Pengcheng 

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月18日 17:15
To: Pengchengliu 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
coredump

On Tue, May 18, 2021 at 11:27 AM Pengchengliu  wrote:
>
> Hi Greg,
>
>Actually I am very confused about ActiveSnapshot and TransactionSnapshot. 
> I don't know why main process send ActiveSnapshot and TransactionSnapshot 
> separately.  And what is exact difference between them?
>If you know that, could you explain that for me? It will be very 
> appreciated.

In the context of a parallel-worker, I am a little confused too, so I can't 
explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the following as 
part of the state that needs to be copied to each worker:

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to plan 
execution, GetTransactionSnapshot() was called and its return value was stored 
in both the QueryDesc and the estate (es_snapshot), which was then pushed on 
the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the last 
snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is not 
clear to me (because isn't then the ActiveSnapshot a potentially earlier 
snapshot? - which it shouldn't be, AFAIK. And also, it's then different to the 
non-parallel case).

>Before we know them exactly, I think we should not modify the 
> TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
> process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have a 
development environment, which I assume you have). Sure, IF it was deemed a 
proper solution, you'd only send the one snapshot, and adjust accordingly in 
ParallelWorkerMain(), but we need not worry about that in order to test it.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread Greg Nancarrow
On Tue, May 18, 2021 at 11:27 AM Pengchengliu  wrote:
>
> Hi Greg,
>
>Actually I am very confused about ActiveSnapshot and TransactionSnapshot. 
> I don't know why main process send ActiveSnapshot and TransactionSnapshot 
> separately.  And what is exact difference between them?
>If you know that, could you explain that for me? It will be very 
> appreciated.

In the context of a parallel-worker, I am a little confused too, so I
can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the
following as part of the state that needs to be copied to each worker:

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to
plan execution, GetTransactionSnapshot() was called and its return
value was stored in both the QueryDesc and the estate (es_snapshot),
which was then pushed on the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the
last snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is
not clear to me (because isn't then the ActiveSnapshot a potentially
earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then
different to the non-parallel case).

>Before we know them exactly, I think we should not modify the 
> TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
> process should send ActiveSnapshot only.

I think it would be worth you trying my suggested change (if you have
a development environment, which I assume you have). Sure, IF it was
deemed a proper solution, you'd only send the one snapshot, and adjust
accordingly in ParallelWorkerMain(), but we need not worry about that
in order to test it.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-18 Thread Pavel Borisov
I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid,
TransactionXmin)) with a subsequent crash in a parallel worker in
PostgreSQL v11-based build, Though I was unable to investigate deeper and
reproduce the issue. The details above in the thread make me think it is a
real and long-time-persistent error that is surely worth to be fixed.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com 


RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-17 Thread Pengchengliu
Hi Greg,
  > I actually think that the Assert in SubTransGetTopmostTransaction() is 
correct, but in the parallel-worker case, the snapshots are not being setup 
correctly.
  
   I agree with you that Assert in SubTransGetTopmostTransaction() is correct.  
The root cause is Transaction Xmin are not being setup correctly in the 
parallel-worker.
   
   Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I 
don't know why main process send ActiveSnapshot and TransactionSnapshot 
separately.  And what is exact difference between them?
   If you know that, could you explain that for me? It will be very 
appreciated. 
   Before we know them exactly, I think we should not modify the 
TransactionSnapshot to ActiveSnapshot in main process. If it is, the main 
process should send ActiveSnapshot only.  

Thanks
Pengcheng
   
   

-Original Message-
From: Greg Nancarrow  
Sent: 2021年5月17日 20:59
To: 刘鹏程 
Cc: Andres Freund ; PostgreSQL-development 

Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert 
coredump

On Sat, May 15, 2021 at 12:37 PM 刘鹏程  wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be 
> reproduced. My colleague and me use different environment both can reproduce 
> it.
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can certainly see 
how, for the snapshots used in the parallel worker case, the Active snapshot 
used is potentially an earlier snapshot that the Transaction snapshot. I don't 
know why it is getting a newer Transaction snapshot in InitializeParallelDSM(), 
when it has previously pushed the return value of GetTransactionSnapshot() as 
the Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how to 
resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is correct, 
but in the parallel-worker case, the snapshots are not being setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */




Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-05-17 Thread Greg Nancarrow
On Sat, May 15, 2021 at 12:37 PM 刘鹏程  wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be 
> reproduced. My colleague and me use different environment both can reproduce 
> it.
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can
certainly see how, for the snapshots used in the parallel worker case,
the Active snapshot used is potentially an earlier snapshot that the
Transaction snapshot. I don't know why it is getting a newer
Transaction snapshot in InitializeParallelDSM(), when it has
previously pushed the return value of GetTransactionSnapshot() as the
Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how
to resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is
correct, but in the parallel-worker case, the snapshots are not being
setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */