Re: [HACKERS] Hot standby v5 patch assertion failure
Simon Riggs wrote: On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: 1/ Setup master and replica with replica using pg_standby 2/ Create a new database (bench in my case) 3/ Initialize pgbench schema size 100 4/ Run with 2 clients and 1 transactions 5/ Replica gets assertion failure I've been unable to reproduce this error in more than 2 days of bashing. The bash test I use is a pgbench variant designed to cause write contention, while at the same time running reads against those same blocks on standby, plus running parallel installcheck. I suspect now there was a problem in ProcArrayClearUnobservedXids(), so I clear the array each time now, whether or not we are in assert mode. i.e. better hygiene around reused data structures. So I *haven't* reworked my earlier code, just checked it all again. So, new patch enclosed. This fixes everything reported so far, plus another 2 bugs I found and fixed during re-test. Patching with v5d, I can no longer reproduce this either. Excellent! Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby v5 patch assertion failure
On Mon, 2008-11-03 at 06:41 +, Simon Riggs wrote: On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 DEBUG: start recovery xid = 7002 lsn = 0/6F012EE4 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 0))), File: procarray.c, Line: 2037) OK, thanks Mark. I'll start looking at it now. It's nice to know the exact line something fails on. I'd instrumented the whole of the UnobservedXids code to trap failures. I've had a couple of errors in that already during development. But what to do about it? I'm thinking the best way to handle this is just to simplify this part of the code some, rather than continue tweaking it. The code attempts to optimise the removal of UnobservedXids, but that feels now like a premature optimisation. So I can probably drop ~100 lines of code. I'm now adding the btree logic also, as well as updating the patch to current head. So I'll return with an updated patch as soon as all that's done and I've run a complete re-test. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot standby v5 patch assertion failure
Trying out a few different scenarios I ran across this: 1/ Setup master and replica with replica using pg_standby 2/ Create a new database (bench in my case) 3/ Initialize pgbench schema size 100 4/ Run with 2 clients and 1 transactions 5/ Replica gets assertion failure This is Postgres head from 2nd Nov (NZST) with v5 patch applied on Freebsd 7.1 Prerelease. Here is the last few log entries: DEBUG: executing restore command pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5439 /data0/pgarchive/8.4 0001006F pg_xlog/RECOVERYXLOG 00010069 2standby.log LOG: restored log file 0001006F from archive DEBUG: RecordKnown xid 6999 parent 0 slot 99 latestObsvXid 6998 firstXid t firstSubXid f markSubtrans f CONTEXT: xlog redo update: rel 1663/16384/16397; tid 141396/12; new 158829/8 DEBUG: start recovery xid = 6999 lsn = 0/6F001708 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 141396/12; new 158829/8 DEBUG: removing recovery locks: slot 99 CONTEXT: xlog redo commit: 2008-11-03 11:57:18.958241+13 DEBUG: RecordKnown xid 7000 parent 0 slot 98 latestObsvXid 6999 firstXid t firstSubXid f markSubtrans f CONTEXT: xlog redo update: rel 1663/16384/16397; tid 32133/24; new 158828/58 DEBUG: start recovery xid = 7000 lsn = 0/6F0055C8 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 32133/24; new 158828/58 DEBUG: removing recovery locks: slot 98 CONTEXT: xlog redo commit: 2008-11-03 11:57:18.963507+13 DEBUG: removing recovery locks: slot 99 CONTEXT: xlog redo commit: 2008-11-03 11:57:18.967145+13 DEBUG: RecordKnown xid 7002 parent 0 slot 98 latestObsvXid 7000 firstXid t firstSubXid f markSubtrans f CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 DEBUG: start recovery xid = 7002 lsn = 0/6F012EE4 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 0))), File: procarray.c, Line: 2037) DEBUG: reaping dead processes LOG: startup process (PID 12600) was terminated by signal 6: Abort trap LOG: aborting startup due to startup process failure DEBUG: proc_exit(1) DEBUG: shmem_exit(1) DEBUG: exit(1) regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby v5 patch assertion failure
On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote: Trying out a few different scenarios I ran across this: CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 DEBUG: start recovery xid = 7002 lsn = 0/6F012EE4 CONTEXT: xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 0))), File: procarray.c, Line: 2037) OK, thanks Mark. I'll start looking at it now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers