Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Thu, Oct 10, 2013 at 2:48 PM, Alvaro Herrera wrote: >> Could you please give me a hint of how to check if this patch was >> included in 9.2.5 or not? > > Yes, this was committed in June: > > commit 99ee15b315c187045a95db7b27fd9d866aea93e0 > Author: Simon Riggs > Date: Sun Jun 23 11:05:02 2013 +0100 Good news, thank you. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Sergey Konoplev escribió: > On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane wrote: > > Sergey Konoplev writes: > >> Just curious, what is the planned date for the next minor release, and > >> BTW where is it possible to see the roadmap for minor releases? > > > > There is no planned date, and certainly no "roadmap". We make minor > > releases when the core team judges that enough (or severe enough) > > fixes have accumulated since the last time. Historically we've averaged > > about four minor releases a year, but that's not set in stone anywhere. > > Could you please give me a hint of how to check if this patch was > included in 9.2.5 or not? Yes, this was committed in June: commit 99ee15b315c187045a95db7b27fd9d866aea93e0 Author: Simon Riggs Date: Sun Jun 23 11:05:02 2013 +0100 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane wrote: > Sergey Konoplev writes: >> Just curious, what is the planned date for the next minor release, and >> BTW where is it possible to see the roadmap for minor releases? > > There is no planned date, and certainly no "roadmap". We make minor > releases when the core team judges that enough (or severe enough) > fixes have accumulated since the last time. Historically we've averaged > about four minor releases a year, but that's not set in stone anywhere. Could you please give me a hint of how to check if this patch was included in 9.2.5 or not? -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Hi all, I recently noticed that I continue to receive "allocation failure" messages from postmaster in /var/log/messages on the server where the problem occurred. Aug 17 23:00:51 tms2 kernel: : postmaster: page allocation failure. order:5, mode:0xd0 Aug 17 23:00:51 tms2 kernel: : Pid: 21223, comm: postmaster Not tainted 2.6.32-279.22.1.el6.x86_64 #1 Aug 17 23:00:51 tms2 kernel: : Call Trace: Aug 17 23:00:51 tms2 kernel: : [] ? __alloc_pages_nodemask+0x77f/0x940 Aug 17 23:00:51 tms2 kernel: : [] ? kmem_getpages+0x62/0x170 Aug 17 23:00:51 tms2 kernel: : [] ? fallback_alloc+0x1ba/0x270 Aug 17 23:00:51 tms2 kernel: : [] ? cache_grow+0x2cf/0x320 Aug 17 23:00:51 tms2 kernel: : [] ? cache_alloc_node+0x99/0x160 Aug 17 23:00:51 tms2 kernel: : [] ? dma_pin_iovec_pages+0xb5/0x230 Aug 17 23:00:51 tms2 kernel: : [] ? __kmalloc+0x189/0x220 Aug 17 23:00:51 tms2 kernel: : [] ? dma_pin_iovec_pages+0xb5/0x230 Aug 17 23:00:51 tms2 kernel: : [] ? lock_sock_nested+0xac/0xc0 Aug 17 23:00:51 tms2 kernel: : [] ? tcp_recvmsg+0x4ca/0xe80 Aug 17 23:00:51 tms2 kernel: : [] ? inet_recvmsg+0x5a/0x90 Aug 17 23:00:51 tms2 kernel: : [] ? sock_recvmsg+0x133/0x160 Aug 17 23:00:51 tms2 kernel: : [] ? autoremove_wake_function+0x0/0x40 Aug 17 23:00:51 tms2 kernel: : [] ? do_sync_read+0xfa/0x140 Aug 17 23:00:51 tms2 kernel: : [] ? sys_recvfrom+0xee/0x180 Aug 17 23:00:51 tms2 kernel: : [] ? poll_select_set_timeout+0x8d/0xa0 Aug 17 23:00:51 tms2 kernel: : [] ? audit_syscall_entry+0x1d7/0x200 Aug 17 23:00:51 tms2 kernel: : [] ? system_call_fastpath+0x16/0x1b Except these messages everything is just fine. The server is a passive (very rarely queried) hot standby. Are there any ideas of why and what could it be? Thank you. On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane wrote: > Sergey Konoplev writes: >> Just curious, what is the planned date for the next minor release, and >> BTW where is it possible to see the roadmap for minor releases? > > There is no planned date, and certainly no "roadmap". We make minor > releases when the core team judges that enough (or severe enough) > fixes have accumulated since the last time. Historically we've averaged > about four minor releases a year, but that's not set in stone anywhere. > > regards, tom lane -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Sergey Konoplev writes: > Just curious, what is the planned date for the next minor release, and > BTW where is it possible to see the roadmap for minor releases? There is no planned date, and certainly no "roadmap". We make minor releases when the core team judges that enough (or severe enough) fixes have accumulated since the last time. Historically we've averaged about four minor releases a year, but that's not set in stone anywhere. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Mon, Jun 10, 2013 at 11:43 PM, Simon Riggs wrote: > On 11 June 2013 04:36, Sergey Konoplev wrote: >> Are there any plans to commit this patch and what version it is going >> to be done to? > > I'll be committing this soon, since we're likely coming up to the next > point release soon. I see, thank you. Just curious, what is the planned date for the next minor release, and BTW where is it possible to see the roadmap for minor releases? > Thanks for the reminder. You are welcome. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 11 June 2013 04:36, Sergey Konoplev wrote: > Hi, > > On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev wrote: >> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund >> wrote: >>> The attached patch fixes this although I don't like the way it knowledge of >>> the >>> point up to which StartupSUBTRANS zeroes pages is handled. >> >> One month has passed since the patched version was installed in our >> production environment and can confirm that everything works perfect. >> Thank you very much for your prompt help, Andres. > > Are there any plans to commit this patch and what version it is going > to be done to? > > Thank you. I'll be committing this soon, since we're likely coming up to the next point release soon. Thanks for the reminder. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Hi, On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev wrote: > On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund wrote: >> The attached patch fixes this although I don't like the way it knowledge of >> the >> point up to which StartupSUBTRANS zeroes pages is handled. > > One month has passed since the patched version was installed in our > production environment and can confirm that everything works perfect. > Thank you very much for your prompt help, Andres. Are there any plans to commit this patch and what version it is going to be done to? Thank you. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund wrote: > The attached patch fixes this although I don't like the way it knowledge of > the > point up to which StartupSUBTRANS zeroes pages is handled. One month has passed since the patched version was installed in our production environment and can confirm that everything works perfect. Thank you very much for your prompt help, Andres. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Fri, Apr 5, 2013 at 7:33 AM, Andres Freund wrote: > Looks like it could be fixed by the patch. But that seems to imply that > you restarted not long before that? Could you check if theres a > different error before those? Yes it had happened straight after restart this time. There are no any errors in logs before it. -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-04-05 07:22:08 -0700, Sergey Konoplev wrote: > On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund wrote: > > Cool. It would be very helpful if you could apply the patch and verify > > that it works, it has been written somewhat blindly. Also I am afraid > > that at least last time there was a second bug involved. > > Okay, I will try to talk to admins but I am afraid it could take long. Ok. > > Could you show the log? > > 2013-04-05 17:26:31 MSK 2113 @ from [vxid: txid:0] [] LOG: database > system was shut down in recovery at 2013-04-05 17:18:02 MSK > 2013-04-05 17:26:32 MSK 2113 @ from [vxid: txid:0] [] LOG: entering > standby mode > 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: redo > starts at 25BD/907338F8 > 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: file > "pg_subtrans/28E5" doesn't exist, reading as zeroes Looks like it could be fixed by the patch. But that seems to imply that you restarted not long before that? Could you check if theres a different error before those? Greetings, Andres Freund PS: The tander.ru addresses seem to bounce all mail I send them... -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund wrote: > Cool. It would be very helpful if you could apply the patch and verify > that it works, it has been written somewhat blindly. Also I am afraid > that at least last time there was a second bug involved. Okay, I will try to talk to admins but I am afraid it could take long. > Could you show the log? 2013-04-05 17:26:31 MSK 2113 @ from [vxid: txid:0] [] LOG: database system was shut down in recovery at 2013-04-05 17:18:02 MSK 2013-04-05 17:26:32 MSK 2113 @ from [vxid: txid:0] [] LOG: entering standby mode 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: redo starts at 25BD/907338F8 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: file "pg_subtrans/28E5" doesn't exist, reading as zeroes 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo xid assignment xtop 686136255: subxacts: 686137344 686137345 686137346 686137347 686137348 686137349 686137350 686137351 686137352 686137353 686137354 686137355 686137356 686137357 686137358 686137359 686137360 686137361 686137362 686137363 686137364 686137365 686137366 686137367 686137368 686137369 686137370 686137371 686137372 686137373 686137374 686137375 686137376 686137377 686137378 686137379 686137380 686137381 686137382 686137383 686137384 686137385 686137386 686137387 686137388 686137389 686137390 686137391 686137392 686137393 686137394 686137395 686137396 686137397 686137398 686137399 686137400 686137401 686137402 686137403 686137404 686137405 686137406 686137407 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: file "pg_subtrans/28E5" doesn't exist, reading as zeroes 2013-04-05 17:26:32 MSK 2113 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo xid assignment xtop 686136255: subxacts: 686139330 686139331 686139332 686139333 686139334 686139335 686139336 686139337 686139338 686139339 686139340 686139341 686139342 686139343 686139344 686139345 686139346 686139347 686139348 686139349 686139350 686139351 686139352 686139353 686139354 686139355 686139356 686139357 686139358 686139359 686139360 686139361 686139362 686139363 686139364 686139365 686139366 686139367 686139368 686139369 686139370 686139371 686139372 686139373 686139374 686139375 686139376 686139377 686139378 686139379 686139380 686139381 686139382 686139383 686139384 686139385 686139386 686139387 686139388 686139389 686139390 686139391 686139392 686139393 [some more like this] 2013-04-05 17:26:36 MSK 2113 @ from [vxid:1/0 txid:0] [] LOG: file "pg_subtrans/28E6" doesn't exist, reading as zeroes 2013-04-05 17:26:36 MSK 2113 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo xid assignment xtop 686216055: subxacts: 686222447 686222448 686222449 686222450 686222451 686222452 686222453 686222454 686222455 686222456 686222457 686222459 686222460 686222461 686222462 686222463 686222464 686222502 686222561 686222647 686222722 686223272 686223359 686223360 686223361 686223363 686223364 686223365 686223366 686223367 686223368 686223369 686223370 686223371 686223372 686223373 686223374 686223375 686223376 686223377 686223378 686223379 686223380 686223381 686223382 686223383 686223384 686223385 686223386 686223387 686223388 686223389 686223390 686223391 686223392 686223393 686223394 686223395 686223396 686223397 686223398 686223399 686223400 686223401 2013-04-05 17:26:36 MSK 2113 @ from [vxid:1/0 txid:0] [] FATAL: could not access status of transaction 686225586 2013-04-05 17:26:36 MSK 2113 @ from [vxid:1/0 txid:0] [] DETAIL: Could not read from file "pg_subtrans/28E6" at offset 253952: Success. 2013-04-05 17:26:36 MSK 2113 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo xid assignment xtop 686225585: subxacts: 686225586 686225587 686225588 686225589 686225590 686225591 686225592 686225593 686225594 686225595 686225596 686225597 686225598 686225599 686225600 686225601 686225602 686225603 686225604 686225605 686225606 686225607 686225608 686225609 686225610 686225611 686225612 686225613 686225614 686225615 686225616 686225617 686225621 686225622 686225625 686225626 686225628 686225632 686225633 686225636 686225637 686225638 686225639 686225640 686225641 686225644 686225645 686225646 686225649 686225650 686225657 686225658 686225661 686225662 686225665 686225666 686225670 686225671 686225672 686225673 686225678 686225679 686225684 686225685 2013-04-05 17:26:36 MSK 2110 @ from [vxid: txid:0] [] LOG: startup process (PID 2113) exited with exit code 1 2013-04-05 17:26:36 MSK 2110 @ from [vxid: txid:0] [] LOG: terminating any other active server processes -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-04-05 07:10:12 -0700, Sergey Konoplev wrote: > On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund wrote: > > The attached patch fixes this although I don't like the way it knowledge of > > the > > point up to which StartupSUBTRANS zeroes pages is handled. > > Thank you for the patch, Andres. > > Is it included in 9.2.4? No. Too late for that. It hasn't bee committed yet. > BTW, it has happened again and I am going to make a copy of the > cluster to be able to provide you some extra information. Do you still > need it? Cool. It would be very helpful if you could apply the patch and verify that it works, it has been written somewhat blindly. Also I am afraid that at least last time there was a second bug involved. Could you show the log? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund wrote: > The attached patch fixes this although I don't like the way it knowledge of > the > point up to which StartupSUBTRANS zeroes pages is handled. Thank you for the patch, Andres. Is it included in 9.2.4? BTW, it has happened again and I am going to make a copy of the cluster to be able to provide you some extra information. Do you still need it? -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-04-02 12:10:12 +0200, Andres Freund wrote: > On 2013-04-01 08:49:16 +0100, Simon Riggs wrote: > > On 30 March 2013 17:21, Andres Freund wrote: > > > > > So if the xid is later than latestObservedXid we extend subtrans one by > > > one. So far so good. But we initialize it in > > > ProcArrayApplyRecoveryInfo() when consistency is initially reached: > > > latestObservedXid = running->nextXid; > > > TransactionIdRetreat(latestObservedXid); > > > Before that subtrans has initially been started up with: > > > if (wasShutdown) > > > oldestActiveXID = > > > PrescanPreparedTransactions(&xids, &nxids); > > > else > > > oldestActiveXID = > > > checkPoint.oldestActiveXid; > > > ... > > > StartupSUBTRANS(oldestActiveXID); > > > > > > That means its only initialized up to checkPoint.oldestActiveXid. As it > > > can take some time till we reach consistency it seems rather plausible > > > that there now will be a gap in initilized pages. From > > > checkPoint.oldestActiveXid to running->nextXid if there are pages > > > inbetween. > > > > That was an old bug. > > > > StartupSUBTRANS() now explicitly fills that gap. Are you saying it > > does that incorrectly? How? > > Well, no. I think StartupSUBTRANS does this correctly, but there's a gap > between the call to Startup* and the first call to ExtendSUBTRANS. The > latter is only called *after* we reached STANDBY_INITIALIZED via > ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to > checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from > running->nextXid - 1. There very well can be a gap inbetween. > The window isn't terribly big but if you use subtransactions as heavily > as Sergey seems to be it doesn't seem unlikely to hit it. > > Let me come up with a testcase and patch. Developing a testcase was trivial, pgbench running the following function: CREATE OR REPLACE FUNCTION recurse_and_assign_txid(level bigint DEFAULT 0) RETURNS bigint LANGUAGE plpgsql AS $b$ BEGIN IF level < 500 THEN RETURN recurse_and_assign_txid(level + 1); ELSE -- assign xid in subtxn and parents CREATE TEMPORARY TABLE foo(); DROP TABLE foo; RETURN txid_current()::bigint; END IF; EXCEPTION WHEN others THEN RAISE NOTICE 'unexpected'; END $b$; When now restarting a standby (so it restarts from another checkpoint) it frequently crashed with various errors: * pg_subtrans/xxx does not exist * (warning) pg_subtrans page does not exist, assuming zero * xid overwritten in SubTransSetParent So I think my theory is correct. The attached patch fixes this although I don't like the way it knowledge of the point up to which StartupSUBTRANS zeroes pages is handled. Makes sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From cbb4e4d30ba1df5a3d370226545efe4966719867 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 2 Apr 2013 20:20:00 +0200 Subject: [PATCH] Ensure that SUBTRANS is initalized gaplessly when starting up HS --- src/backend/access/transam/xlog.c |3 ++ src/backend/storage/ipc/procarray.c | 53 +++ src/include/storage/procarray.h |1 + 3 files changed, 51 insertions(+), 6 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 07c68ad..4a9462e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5391,6 +5391,9 @@ StartupXLOG(void) oldestActiveXID = checkPoint.oldestActiveXid; Assert(TransactionIdIsValid(oldestActiveXID)); + /* Tell procarray about the range of xids it has to deal with */ + ProcArrayInitRecovery(ShmemVariableCache->nextXid); + /* * Startup commit log and subtrans only. Other SLRUs are not * maintained during recovery and need not be started yet. diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 4308128..53a5aa2 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -470,6 +470,28 @@ ProcArrayClearTransaction(PGPROC *proc) } /* + * ProcArrayInitRecovery -- initialize recovery xid mgmt environment + * + * Remember up to where the startup process initialized the CLOG and subtrans + * so we can ensure its initialized gaplessly up to the point where necessary + * while in recovery. + */ +void +ProcArrayInitRecovery(TransactionId initializedUptoXID) +{ + Assert(standbyState == STANDBY_INITIALIZED); + Assert(TransactionIdIsNormal(initializedUptoXID)); + + /* + * we set latestObservedXid to the xid SUBTRANS has been initialized upto + * so we can extend it from that point onwards when we reach a consistent + * state in ProcArrayApplyRe
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-04-01 08:49:16 +0100, Simon Riggs wrote: > On 30 March 2013 17:21, Andres Freund wrote: > > > So if the xid is later than latestObservedXid we extend subtrans one by > > one. So far so good. But we initialize it in > > ProcArrayApplyRecoveryInfo() when consistency is initially reached: > > latestObservedXid = running->nextXid; > > TransactionIdRetreat(latestObservedXid); > > Before that subtrans has initially been started up with: > > if (wasShutdown) > > oldestActiveXID = > > PrescanPreparedTransactions(&xids, &nxids); > > else > > oldestActiveXID = > > checkPoint.oldestActiveXid; > > ... > > StartupSUBTRANS(oldestActiveXID); > > > > That means its only initialized up to checkPoint.oldestActiveXid. As it > > can take some time till we reach consistency it seems rather plausible > > that there now will be a gap in initilized pages. From > > checkPoint.oldestActiveXid to running->nextXid if there are pages > > inbetween. > > That was an old bug. > > StartupSUBTRANS() now explicitly fills that gap. Are you saying it > does that incorrectly? How? Well, no. I think StartupSUBTRANS does this correctly, but there's a gap between the call to Startup* and the first call to ExtendSUBTRANS. The latter is only called *after* we reached STANDBY_INITIALIZED via ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from running->nextXid - 1. There very well can be a gap inbetween. The window isn't terribly big but if you use subtransactions as heavily as Sergey seems to be it doesn't seem unlikely to hit it. Let me come up with a testcase and patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 30 March 2013 17:21, Andres Freund wrote: > So if the xid is later than latestObservedXid we extend subtrans one by > one. So far so good. But we initialize it in > ProcArrayApplyRecoveryInfo() when consistency is initially reached: > latestObservedXid = running->nextXid; > TransactionIdRetreat(latestObservedXid); > Before that subtrans has initially been started up with: > if (wasShutdown) > oldestActiveXID = > PrescanPreparedTransactions(&xids, &nxids); > else > oldestActiveXID = checkPoint.oldestActiveXid; > ... > StartupSUBTRANS(oldestActiveXID); > > That means its only initialized up to checkPoint.oldestActiveXid. As it > can take some time till we reach consistency it seems rather plausible > that there now will be a gap in initilized pages. From > checkPoint.oldestActiveXid to running->nextXid if there are pages > inbetween. That was an old bug. StartupSUBTRANS() now explicitly fills that gap. Are you saying it does that incorrectly? How? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
unsubscribe On Thu, Mar 28, 2013 at 10:30 PM, Sergey Konoplev wrote: > Hi all, > > A couple of days ago I found the replica stopped after the PANIC message: > > PANIC: WAL contains references to invalid pages > > When I tried to restart it I got this FATAL: > > FATAL: could not access status of transaction 280557568 > > Below is the description of the server and information from PostgreSQL > and system logs. After googling the problem I have found nothing like > this. > > Any thoughts of what it could be and how to prevent it in the future? > > Hardware: > > IBM System x3650 M4, 148GB RAM, NAS > > Software: > > PostgreSQL 9.2.3, yum.postgresql.org > CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64 > > Configuration: > > listen_addresses = '*' > max_connections = 550 > shared_buffers = 35GB > work_mem = 256MB > maintenance_work_mem = 1GB > bgwriter_delay = 10ms > bgwriter_lru_multiplier = 10.0 > effective_io_concurrency = 32 > wal_level = hot_standby > synchronous_commit = off > checkpoint_segments = 1024 > checkpoint_timeout = 1h > checkpoint_completion_target = 0.9 > checkpoint_warning = 5min > max_wal_senders = 3 > wal_keep_segments = 2048 > hot_standby = on > max_standby_streaming_delay = 5min > hot_standby_feedback = on > effective_cache_size = 133GB > log_directory = '/var/log/pgsql' > log_filename = 'postgresql-%Y-%m-%d.log' > log_checkpoints = on > log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] ' > log_lock_waits = on > log_statement = 'ddl' > log_timezone = 'W-SU' > track_activity_query_size = 4096 > autovacuum_max_workers = 5 > autovacuum_naptime = 5s > autovacuum_vacuum_scale_factor = 0.05 > autovacuum_analyze_scale_factor = 0.05 > autovacuum_vacuum_cost_delay = 5ms > datestyle = 'iso, dmy' > timezone = 'W-SU' > lc_messages = 'en_US.UTF-8' > lc_monetary = 'ru_RU.UTF-8' > lc_numeric = 'ru_RU.UTF-8' > lc_time = 'ru_RU.UTF-8' > default_text_search_config = 'pg_catalog.russian' > > System: > > # Controls the maximum shared segment size, in bytes > kernel.shmmax = 53287555072 > > # Controls the maximum number of shared memory segments, in pages > kernel.shmall = 13009657 > > # Maximum number of file-handles > fs.file-max = 65535 > > # pdflush tuning to prevent lag spikes > vm.dirty_ratio = 10 > vm.dirty_background_ratio = 1 > vm.dirty_expire_centisecs = 499 > > # Prevent the scheduler breakdown > kernel.sched_migration_cost = 500 > > # Turned off to provide more CPU to PostgreSQL > kernel.sched_autogroup_enabled = 0 > > # Setup hugepages > vm.hugetlb_shm_group = 26 > vm.hugepages_treat_as_movable = 0 > vm.nr_overcommit_hugepages = 512 > > # The Huge Page Size is 2048kB, so for 35GB shared buffers the number is > 17920 > vm.nr_hugepages = 17920 > > # Turn off the NUMA local pages reclaim as it leads to wrong caching > strategy for databases > vm.zone_reclaim_mode = 0 > > Environment: > > HUGETLB_SHM=yes > LD_PRELOAD='/usr/lib64/libhugetlbfs.so' > export HUGETLB_SHM LD_PRELOAD > > When it is stopped: > > 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: > restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction > log file(s) added, 0 removed, 555 recycled; write=3237.402 s, > sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s, > average=0.000 s > 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: recovery > restart point at 2538/6E154AC0 > 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] DETAIL: last > completed transaction was at log time 2013-03-26 11:50:31.613948+04 > 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: > restartpoint starting: xlog > 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] WARNING: > page 451 of relation base/16436/2686702648 is uninitialized > 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, > lastBlockVacuumed 0 > 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] PANIC: WAL > contains references to invalid pages > 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: > xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, > lastBlockVacuumed 0 > 2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: startup > process (PID 3773) was terminated by signal 6: Aborted > 2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: > terminating any other active server processes > > From /var/log/messages: > > Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure. > order:8, mode:0xd0 > Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted > 2.6.32-279.22.1.el6.x86_64 #1 > Mar 26 10:50:52 tms2 kernel: : Call Trace: > Mar 26 10:50:52 tms2 kernel: : [] ? > __alloc_pages_nodemask+0x77f/0x940 > Mar 26 10:50:52 tms2 kernel: : [] ? > kmem_getpages+0x62/0x170 > Mar 26 10:50:52 tms2 kernel: : [] ? > fallback_alloc+0x1ba/0x270 > Mar 26 10:50:52 tms2 kernel: : [] ? > cache_grow+0x2cf/0x320 > Mar 26 10:50:52 tms2 kernel: : [] ? > cache_alloc_node+0x99/0x160 > Mar 26
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-03-29 14:53:26 -0700, Sergey Konoplev wrote: > On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund wrote: > > I have to admit, I find it a bit confusing that so many people report a > > bug and then immediately destroy all evidence of the bug. Just seems to > > a happen a bit too frequently. > > You see, businesses usually need it up ASAP again. Sorry, I must have > note down the output of pg_controldata straight after it got broken, I > just have not came up to it. But the business will also need the standby working correctly in case of a critical incident of a primary. So it should have quite an interest in fixing bugs in that area. Yes, I realize thats not always easy to do :(. > > Thats not a pg_controldata output from the broken replica though, or is > > it? I guess its from a new standby? > > That was the output from the standby that was rsync-ed on top of the > broken one. I thought you might find something useful in it. > Can I test your guess some other way? And what was the guess? Don't think you can easily test it. And after reading more code I am pretty sure my original guess was bogus. As was my second. And third ;) But I think I see what could be going on: During HS we maintain pg_subtrans so we can deal with more than PGPROC_MAX_CACHED_SUBXIDS in one TX. For that we need to regularly extend subtrans so the pages are initialized when we setup the topxid<->subxid mapping in ProcArrayApplyXidAssignment(). The call to ExtendSUBTRANS happens in RecordKnownAssignedTransactionIds() which is called from several places, including ProcArrayApplyXidAssignment(). The logic it uses is: if (TransactionIdFollows(xid, latestObservedXid)) { TransactionId next_expected_xid; /* * Extend clog and subtrans like we do in GetNewTransactionId() during * normal operation using individual extend steps. Typical case * requires almost no activity. */ next_expected_xid = latestObservedXid; TransactionIdAdvance(next_expected_xid); while (TransactionIdPrecedesOrEquals(next_expected_xid, xid)) { ExtendCLOG(next_expected_xid); ExtendSUBTRANS(next_expected_xid); TransactionIdAdvance(next_expected_xid); } So if the xid is later than latestObservedXid we extend subtrans one by one. So far so good. But we initialize it in ProcArrayApplyRecoveryInfo() when consistency is initially reached: latestObservedXid = running->nextXid; TransactionIdRetreat(latestObservedXid); Before that subtrans has initially been started up with: if (wasShutdown) oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids); else oldestActiveXID = checkPoint.oldestActiveXid; ... StartupSUBTRANS(oldestActiveXID); That means its only initialized up to checkPoint.oldestActiveXid. As it can take some time till we reach consistency it seems rather plausible that there now will be a gap in initilized pages. From checkPoint.oldestActiveXid to running->nextXid if there are pages inbetween. Does that explanation sound about right to anybody else? I'll provide a patch for the issue in a while, for now I'll try to reproduce it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund wrote: > I have to admit, I find it a bit confusing that so many people report a > bug and then immediately destroy all evidence of the bug. Just seems to > a happen a bit too frequently. You see, businesses usually need it up ASAP again. Sorry, I must have note down the output of pg_controldata straight after it got broken, I just have not came up to it. > Thats not a pg_controldata output from the broken replica though, or is > it? I guess its from a new standby? That was the output from the standby that was rsync-ed on top of the broken one. I thought you might find something useful in it. Can I test your guess some other way? And what was the guess? -- Kind regards, Sergey Konoplev Database and Software Consultant Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 2013-03-29 14:19:59 -0700, Sergey Konoplev wrote: > On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de > wrote: > > I think I See whats going on. Do you still have the datadir available? If > > so, could you send the pg_controldata output? > > I have already rebuilt the replica, however below is the output if it is > useful: I have to admit, I find it a bit confusing that so many people report a bug and then immediately destroy all evidence of the bug. Just seems to a happen a bit too frequently. > pg_control version number:922 > Catalog version number: 201204301 > Database system identifier: 5858109675396804534 > Database cluster state: in archive recovery > pg_control last modified: Сбт 30 Мар 2013 00:21:11 > Latest checkpoint location: 258B/BDBBE748 > Prior checkpoint location:258B/86DABCB8 > Latest checkpoint's REDO location:258B/8B78BED0 > Latest checkpoint's TimeLineID: 2 > Latest checkpoint's full_page_writes: on > Latest checkpoint's NextXID: 0/423386899 > Latest checkpoint's NextOID: 2758636912 > Latest checkpoint's NextMultiXactId: 103920 > Latest checkpoint's NextMultiOffset: 431309 > Latest checkpoint's oldestXID:225634745 > Latest checkpoint's oldestXID's DB: 16436 > Latest checkpoint's oldestActiveXID: 421766298 > Time of latest checkpoint:Птн 29 Мар 2013 22:33:01 Thats not a pg_controldata output from the broken replica though, or is it? I guess its from a new standby? Andres Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de wrote: > I think I See whats going on. Do you still have the datadir available? If so, > could you send the pg_controldata output? I have already rebuilt the replica, however below is the output if it is useful: pg_control version number:922 Catalog version number: 201204301 Database system identifier: 5858109675396804534 Database cluster state: in archive recovery pg_control last modified: Сбт 30 Мар 2013 00:21:11 Latest checkpoint location: 258B/BDBBE748 Prior checkpoint location:258B/86DABCB8 Latest checkpoint's REDO location:258B/8B78BED0 Latest checkpoint's TimeLineID: 2 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/423386899 Latest checkpoint's NextOID: 2758636912 Latest checkpoint's NextMultiXactId: 103920 Latest checkpoint's NextMultiOffset: 431309 Latest checkpoint's oldestXID:225634745 Latest checkpoint's oldestXID's DB: 16436 Latest checkpoint's oldestActiveXID: 421766298 Time of latest checkpoint:Птн 29 Мар 2013 22:33:01 Minimum recovery ending location: 258C/14AA5FA0 Backup start location:0/0 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:hot_standby Current max_connections setting: 550 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value > > > Andres > >>Hardware: >> >>IBM System x3650 M4, 148GB RAM, NAS >> >>Software: >> >>PostgreSQL 9.2.3, yum.postgresql.org >>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64 >> >>Configuration: >> >>listen_addresses = '*' >>max_connections = 550 >>shared_buffers = 35GB >>work_mem = 256MB >>maintenance_work_mem = 1GB >>bgwriter_delay = 10ms >>bgwriter_lru_multiplier = 10.0 >>effective_io_concurrency = 32 >>wal_level = hot_standby >>synchronous_commit = off >>checkpoint_segments = 1024 >>checkpoint_timeout = 1h >>checkpoint_completion_target = 0.9 >>checkpoint_warning = 5min >>max_wal_senders = 3 >>wal_keep_segments = 2048 >>hot_standby = on >>max_standby_streaming_delay = 5min >>hot_standby_feedback = on >>effective_cache_size = 133GB >>log_directory = '/var/log/pgsql' >>log_filename = 'postgresql-%Y-%m-%d.log' >>log_checkpoints = on >>log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] ' >>log_lock_waits = on >>log_statement = 'ddl' >>log_timezone = 'W-SU' >>track_activity_query_size = 4096 >>autovacuum_max_workers = 5 >>autovacuum_naptime = 5s >>autovacuum_vacuum_scale_factor = 0.05 >>autovacuum_analyze_scale_factor = 0.05 >>autovacuum_vacuum_cost_delay = 5ms >>datestyle = 'iso, dmy' >>timezone = 'W-SU' >>lc_messages = 'en_US.UTF-8' >>lc_monetary = 'ru_RU.UTF-8' >>lc_numeric = 'ru_RU.UTF-8' >>lc_time = 'ru_RU.UTF-8' >>default_text_search_config = 'pg_catalog.russian' >> >>System: >> >># Controls the maximum shared segment size, in bytes >>kernel.shmmax = 53287555072 >> >># Controls the maximum number of shared memory segments, in pages >>kernel.shmall = 13009657 >> >># Maximum number of file-handles >>fs.file-max = 65535 >> >># pdflush tuning to prevent lag spikes >>vm.dirty_ratio = 10 >>vm.dirty_background_ratio = 1 >>vm.dirty_expire_centisecs = 499 >> >># Prevent the scheduler breakdown >>kernel.sched_migration_cost = 500 >> >># Turned off to provide more CPU to PostgreSQL >>kernel.sched_autogroup_enabled = 0 >> >># Setup hugepages >>vm.hugetlb_shm_group = 26 >>vm.hugepages_treat_as_movable = 0 >>vm.nr_overcommit_hugepages = 512 >> >># The Huge Page Size is 2048kB, so for 35GB shared buffers the number >>is 17920 >>vm.nr_hugepages = 17920 >> >># Turn off the NUMA local pages reclaim as it leads to wrong caching >>strategy for databases >>vm.zone_reclaim_mode = 0 >> >>Environment: >> >>HUGETLB_SHM=yes >>LD_PRELOAD='/usr/lib64/libhugetlbfs.so' >>export HUGETLB_SHM LD_PRELOAD >> >>When it is stopped: >> >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: >>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction >>log file(s) added, 0 removed, 555 recycled; write=3237.402 s, >>sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s, >>average=0.000 s >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: recovery >>restart point at 2538/6E154AC0 >>2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] DETAIL: last >>completed transaction was at log time 2013-03-26 11:50:31.613948+04 >
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Hi Sergey Konoplev schrieb: >Hi all, > >A couple of days ago I found the replica stopped after the PANIC >message: > >PANIC: WAL contains references to invalid pages > >When I tried to restart it I got this FATAL: > >FATAL: could not access status of transaction 280557568 > >Below is the description of the server and information from PostgreSQL >and system logs. After googling the problem I have found nothing like >this. > >Any thoughts of what it could be and how to prevent it in the future? I think I See whats going on. Do you still have the datadir available? If so, could you send the pg_controldata output? Andres >Hardware: > >IBM System x3650 M4, 148GB RAM, NAS > >Software: > >PostgreSQL 9.2.3, yum.postgresql.org >CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64 > >Configuration: > >listen_addresses = '*' >max_connections = 550 >shared_buffers = 35GB >work_mem = 256MB >maintenance_work_mem = 1GB >bgwriter_delay = 10ms >bgwriter_lru_multiplier = 10.0 >effective_io_concurrency = 32 >wal_level = hot_standby >synchronous_commit = off >checkpoint_segments = 1024 >checkpoint_timeout = 1h >checkpoint_completion_target = 0.9 >checkpoint_warning = 5min >max_wal_senders = 3 >wal_keep_segments = 2048 >hot_standby = on >max_standby_streaming_delay = 5min >hot_standby_feedback = on >effective_cache_size = 133GB >log_directory = '/var/log/pgsql' >log_filename = 'postgresql-%Y-%m-%d.log' >log_checkpoints = on >log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] ' >log_lock_waits = on >log_statement = 'ddl' >log_timezone = 'W-SU' >track_activity_query_size = 4096 >autovacuum_max_workers = 5 >autovacuum_naptime = 5s >autovacuum_vacuum_scale_factor = 0.05 >autovacuum_analyze_scale_factor = 0.05 >autovacuum_vacuum_cost_delay = 5ms >datestyle = 'iso, dmy' >timezone = 'W-SU' >lc_messages = 'en_US.UTF-8' >lc_monetary = 'ru_RU.UTF-8' >lc_numeric = 'ru_RU.UTF-8' >lc_time = 'ru_RU.UTF-8' >default_text_search_config = 'pg_catalog.russian' > >System: > ># Controls the maximum shared segment size, in bytes >kernel.shmmax = 53287555072 > ># Controls the maximum number of shared memory segments, in pages >kernel.shmall = 13009657 > ># Maximum number of file-handles >fs.file-max = 65535 > ># pdflush tuning to prevent lag spikes >vm.dirty_ratio = 10 >vm.dirty_background_ratio = 1 >vm.dirty_expire_centisecs = 499 > ># Prevent the scheduler breakdown >kernel.sched_migration_cost = 500 > ># Turned off to provide more CPU to PostgreSQL >kernel.sched_autogroup_enabled = 0 > ># Setup hugepages >vm.hugetlb_shm_group = 26 >vm.hugepages_treat_as_movable = 0 >vm.nr_overcommit_hugepages = 512 > ># The Huge Page Size is 2048kB, so for 35GB shared buffers the number >is 17920 >vm.nr_hugepages = 17920 > ># Turn off the NUMA local pages reclaim as it leads to wrong caching >strategy for databases >vm.zone_reclaim_mode = 0 > >Environment: > >HUGETLB_SHM=yes >LD_PRELOAD='/usr/lib64/libhugetlbfs.so' >export HUGETLB_SHM LD_PRELOAD > >When it is stopped: > >2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: >restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction >log file(s) added, 0 removed, 555 recycled; write=3237.402 s, >sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s, >average=0.000 s >2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: recovery >restart point at 2538/6E154AC0 >2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] DETAIL: last >completed transaction was at log time 2013-03-26 11:50:31.613948+04 >2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: >restartpoint starting: xlog >2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] WARNING: >page 451 of relation base/16436/2686702648 is uninitialized >2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: >xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, >lastBlockVacuumed 0 >2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] PANIC: WAL >contains references to invalid pages >2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: >xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, >lastBlockVacuumed 0 >2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: startup >process (PID 3773) was terminated by signal 6: Aborted >2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: >terminating any other active server processes > >From /var/log/messages: > >Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure. >order:8, mode:0xd0 >Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted >2.6.32-279.22.1.el6.x86_64 #1 >Mar 26 10:50:52 tms2 kernel: : Call Trace: >Mar 26 10:50:52 tms2 kernel: : [] ? >__alloc_pages_nodemask+0x77f/0x940 >Mar 26 10:50:52 tms2 kernel: : [] ? >kmem_getpages+0x62/0x170 >Mar 26 10:50:52 tms2 kernel: : [] ? >fallback_alloc+0x1ba/0x270 >Mar 26 10:50:52 tms2 kernel: : [] ? >cache_grow+0x2cf/0x320 >Mar 26 10:50:52 tms2 kernel: : [] ? >cache_alloc_node+0x99/0x160 >Mar 26 10:50:52 tms2 kernel: : [] ? >dma_pin_
[BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Hi all, A couple of days ago I found the replica stopped after the PANIC message: PANIC: WAL contains references to invalid pages When I tried to restart it I got this FATAL: FATAL: could not access status of transaction 280557568 Below is the description of the server and information from PostgreSQL and system logs. After googling the problem I have found nothing like this. Any thoughts of what it could be and how to prevent it in the future? Hardware: IBM System x3650 M4, 148GB RAM, NAS Software: PostgreSQL 9.2.3, yum.postgresql.org CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64 Configuration: listen_addresses = '*' max_connections = 550 shared_buffers = 35GB work_mem = 256MB maintenance_work_mem = 1GB bgwriter_delay = 10ms bgwriter_lru_multiplier = 10.0 effective_io_concurrency = 32 wal_level = hot_standby synchronous_commit = off checkpoint_segments = 1024 checkpoint_timeout = 1h checkpoint_completion_target = 0.9 checkpoint_warning = 5min max_wal_senders = 3 wal_keep_segments = 2048 hot_standby = on max_standby_streaming_delay = 5min hot_standby_feedback = on effective_cache_size = 133GB log_directory = '/var/log/pgsql' log_filename = 'postgresql-%Y-%m-%d.log' log_checkpoints = on log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] ' log_lock_waits = on log_statement = 'ddl' log_timezone = 'W-SU' track_activity_query_size = 4096 autovacuum_max_workers = 5 autovacuum_naptime = 5s autovacuum_vacuum_scale_factor = 0.05 autovacuum_analyze_scale_factor = 0.05 autovacuum_vacuum_cost_delay = 5ms datestyle = 'iso, dmy' timezone = 'W-SU' lc_messages = 'en_US.UTF-8' lc_monetary = 'ru_RU.UTF-8' lc_numeric = 'ru_RU.UTF-8' lc_time = 'ru_RU.UTF-8' default_text_search_config = 'pg_catalog.russian' System: # Controls the maximum shared segment size, in bytes kernel.shmmax = 53287555072 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 13009657 # Maximum number of file-handles fs.file-max = 65535 # pdflush tuning to prevent lag spikes vm.dirty_ratio = 10 vm.dirty_background_ratio = 1 vm.dirty_expire_centisecs = 499 # Prevent the scheduler breakdown kernel.sched_migration_cost = 500 # Turned off to provide more CPU to PostgreSQL kernel.sched_autogroup_enabled = 0 # Setup hugepages vm.hugetlb_shm_group = 26 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 512 # The Huge Page Size is 2048kB, so for 35GB shared buffers the number is 17920 vm.nr_hugepages = 17920 # Turn off the NUMA local pages reclaim as it leads to wrong caching strategy for databases vm.zone_reclaim_mode = 0 Environment: HUGETLB_SHM=yes LD_PRELOAD='/usr/lib64/libhugetlbfs.so' export HUGETLB_SHM LD_PRELOAD When it is stopped: 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction log file(s) added, 0 removed, 555 recycled; write=3237.402 s, sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s, average=0.000 s 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: recovery restart point at 2538/6E154AC0 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] DETAIL: last completed transaction was at log time 2013-03-26 11:50:31.613948+04 2013-03-26 11:50:32 MSK 3775 @ from [vxid: txid:0] [] LOG: restartpoint starting: xlog 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] WARNING: page 451 of relation base/16436/2686702648 is uninitialized 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, lastBlockVacuumed 0 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] PANIC: WAL contains references to invalid pages 2013-03-26 11:51:16 MSK 3773 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo vacuum: rel 1663/16436/2686702648; blk 2485, lastBlockVacuumed 0 2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: startup process (PID 3773) was terminated by signal 6: Aborted 2013-03-26 11:51:16 MSK 3770 @ from [vxid: txid:0] [] LOG: terminating any other active server processes >From /var/log/messages: Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure. order:8, mode:0xd0 Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted 2.6.32-279.22.1.el6.x86_64 #1 Mar 26 10:50:52 tms2 kernel: : Call Trace: Mar 26 10:50:52 tms2 kernel: : [] ? __alloc_pages_nodemask+0x77f/0x940 Mar 26 10:50:52 tms2 kernel: : [] ? kmem_getpages+0x62/0x170 Mar 26 10:50:52 tms2 kernel: : [] ? fallback_alloc+0x1ba/0x270 Mar 26 10:50:52 tms2 kernel: : [] ? cache_grow+0x2cf/0x320 Mar 26 10:50:52 tms2 kernel: : [] ? cache_alloc_node+0x99/0x160 Mar 26 10:50:52 tms2 kernel: : [] ? dma_pin_iovec_pages+0xb5/0x230 Mar 26 10:50:52 tms2 kernel: : [] ? __kmalloc+0x189/0x220 Mar 26 10:50:52 tms2 kernel: : [] ? dma_pin_iovec_pages+0xb5/0x230 Mar 26 10:50:52 tms2 kernel: : [] ? lock_sock_nested+0xac/0xc0 Mar 26 10:50:52 tms2 kernel: : [] ? tcp_recvmsg+0x4ca/0xe80 Mar 26 10:50:52 tms2 kernel: : [] ? xfs_vm_wri