Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Sergey Konoplev
On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Sergey Konoplev gray...@gmail.com writes:
 Just curious, what is the planned date for the next minor release, and
 BTW where is it possible to see the roadmap for minor releases?

 There is no planned date, and certainly no roadmap.  We make minor
 releases when the core team judges that enough (or severe enough)
 fixes have accumulated since the last time.  Historically we've averaged
 about four minor releases a year, but that's not set in stone anywhere.

Could you please give me a hint of how to check if this patch was
included in 9.2.5 or not?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Alvaro Herrera
Sergey Konoplev escribió:
 On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Sergey Konoplev gray...@gmail.com writes:
  Just curious, what is the planned date for the next minor release, and
  BTW where is it possible to see the roadmap for minor releases?
 
  There is no planned date, and certainly no roadmap.  We make minor
  releases when the core team judges that enough (or severe enough)
  fixes have accumulated since the last time.  Historically we've averaged
  about four minor releases a year, but that's not set in stone anywhere.
 
 Could you please give me a hint of how to check if this patch was
 included in 9.2.5 or not?

Yes, this was committed in June:

commit 99ee15b315c187045a95db7b27fd9d866aea93e0
Author: Simon Riggs si...@2ndquadrant.com
Date:   Sun Jun 23 11:05:02 2013 +0100

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Sergey Konoplev
On Thu, Oct 10, 2013 at 2:48 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Could you please give me a hint of how to check if this patch was
 included in 9.2.5 or not?

 Yes, this was committed in June:

 commit 99ee15b315c187045a95db7b27fd9d866aea93e0
 Author: Simon Riggs si...@2ndquadrant.com
 Date:   Sun Jun 23 11:05:02 2013 +0100

Good news, thank you.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-08-19 Thread Sergey Konoplev
Hi all,

I recently noticed that I continue to receive allocation failure
messages from postmaster in /var/log/messages on the server where the
problem occurred.

Aug 17 23:00:51 tms2 kernel: : postmaster: page allocation failure.
order:5, mode:0xd0
Aug 17 23:00:51 tms2 kernel: : Pid: 21223, comm: postmaster Not
tainted 2.6.32-279.22.1.el6.x86_64 #1
Aug 17 23:00:51 tms2 kernel: : Call Trace:
Aug 17 23:00:51 tms2 kernel: : [8112343f] ?
__alloc_pages_nodemask+0x77f/0x940
Aug 17 23:00:51 tms2 kernel: : [8115d3e2] ? kmem_getpages+0x62/0x170
Aug 17 23:00:51 tms2 kernel: : [8115dffa] ? fallback_alloc+0x1ba/0x270
Aug 17 23:00:51 tms2 kernel: : [8115da4f] ? cache_grow+0x2cf/0x320
Aug 17 23:00:51 tms2 kernel: : [8115dd79] ?
cache_alloc_node+0x99/0x160
Aug 17 23:00:51 tms2 kernel: : [813fd455] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [8115eb49] ? __kmalloc+0x189/0x220
Aug 17 23:00:51 tms2 kernel: : [813fd455] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [8141a47c] ? lock_sock_nested+0xac/0xc0
Aug 17 23:00:51 tms2 kernel: : [8146edaa] ? tcp_recvmsg+0x4ca/0xe80
Aug 17 23:00:51 tms2 kernel: : [8148f90a] ? inet_recvmsg+0x5a/0x90
Aug 17 23:00:51 tms2 kernel: : [81418b93] ? sock_recvmsg+0x133/0x160
Aug 17 23:00:51 tms2 kernel: : [81090be0] ?
autoremove_wake_function+0x0/0x40
Aug 17 23:00:51 tms2 kernel: : [8117660a] ? do_sync_read+0xfa/0x140
Aug 17 23:00:51 tms2 kernel: : [81418d0e] ? sys_recvfrom+0xee/0x180
Aug 17 23:00:51 tms2 kernel: : [8118b09d] ?
poll_select_set_timeout+0x8d/0xa0
Aug 17 23:00:51 tms2 kernel: : [810d3f47] ?
audit_syscall_entry+0x1d7/0x200
Aug 17 23:00:51 tms2 kernel: : [8100b072] ?
system_call_fastpath+0x16/0x1b

Except these messages everything is just fine. The server is a passive
(very rarely queried) hot standby.

Are there any ideas of why and what could it be?

Thank you.

On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Sergey Konoplev gray...@gmail.com writes:
 Just curious, what is the planned date for the next minor release, and
 BTW where is it possible to see the roadmap for minor releases?

 There is no planned date, and certainly no roadmap.  We make minor
 releases when the core team judges that enough (or severe enough)
 fixes have accumulated since the last time.  Historically we've averaged
 about four minor releases a year, but that's not set in stone anywhere.

 regards, tom lane



-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-11 Thread Simon Riggs
On 11 June 2013 04:36, Sergey Konoplev gray...@gmail.com wrote:
 Hi,

 On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev gray...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 The attached patch fixes this although I don't like the way it knowledge of 
 the
 point up to which StartupSUBTRANS zeroes pages is handled.

 One month has passed since the patched version was installed in our
 production environment and can confirm that everything works perfect.
 Thank you very much for your prompt help, Andres.

 Are there any plans to commit this patch and what version it is going
 to be done to?

 Thank you.

I'll be committing this soon, since we're likely coming up to the next
point release soon.

Thanks for the reminder.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-11 Thread Sergey Konoplev
On Mon, Jun 10, 2013 at 11:43 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 11 June 2013 04:36, Sergey Konoplev gray...@gmail.com wrote:
 Are there any plans to commit this patch and what version it is going
 to be done to?

 I'll be committing this soon, since we're likely coming up to the next
 point release soon.

I see, thank you.

Just curious, what is the planned date for the next minor release, and
BTW where is it possible to see the roadmap for minor releases?

 Thanks for the reminder.

You are welcome.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-11 Thread Tom Lane
Sergey Konoplev gray...@gmail.com writes:
 Just curious, what is the planned date for the next minor release, and
 BTW where is it possible to see the roadmap for minor releases?

There is no planned date, and certainly no roadmap.  We make minor
releases when the core team judges that enough (or severe enough)
fixes have accumulated since the last time.  Historically we've averaged
about four minor releases a year, but that's not set in stone anywhere.

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-05-09 Thread Sergey Konoplev
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 The attached patch fixes this although I don't like the way it knowledge of 
 the
 point up to which StartupSUBTRANS zeroes pages is handled.

One month has passed since the patched version was installed in our
production environment and can confirm that everything works perfect.
Thank you very much for your prompt help, Andres.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 The attached patch fixes this although I don't like the way it knowledge of 
 the
 point up to which StartupSUBTRANS zeroes pages is handled.

Thank you for the patch, Andres.

Is it included in 9.2.4?

BTW, it has happened again and I am going to make a copy of the
cluster to be able to provide you some extra information. Do you still
need it?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Andres Freund
On 2013-04-05 07:10:12 -0700, Sergey Konoplev wrote:
 On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  The attached patch fixes this although I don't like the way it knowledge of 
  the
  point up to which StartupSUBTRANS zeroes pages is handled.
 
 Thank you for the patch, Andres.
 
 Is it included in 9.2.4?

No. Too late for that. It hasn't bee committed yet.

 BTW, it has happened again and I am going to make a copy of the
 cluster to be able to provide you some extra information. Do you still
 need it?

Cool. It would be very helpful if you could apply the patch and verify
that it works, it has been written somewhat blindly. Also I am afraid
that at least last time there was a second bug involved.

Could you show the log?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund and...@2ndquadrant.com wrote:
 Cool. It would be very helpful if you could apply the patch and verify
 that it works, it has been written somewhat blindly. Also I am afraid
 that at least last time there was a second bug involved.

Okay, I will try to talk to admins but I am afraid it could take long.

 Could you show the log?

2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
system was shut down in recovery at 2013-04-05 17:18:02 MSK
2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
starts at 25BD/907338F8
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
pg_subtrans/28E5 doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686137344 686137345
686137346 686137347 686137348 686137349 686137350 686137351 686137352
686137353 686137354 686137355 686137356 686137357 686137358 686137359
686137360 686137361 686137362 686137363 686137364 686137365 686137366
686137367 686137368 686137369 686137370 686137371 686137372 686137373
686137374 686137375 686137376 686137377 686137378 686137379 686137380
686137381 686137382 686137383 686137384 686137385 686137386 686137387
686137388 686137389 686137390 686137391 686137392 686137393 686137394
686137395 686137396 686137397 686137398 686137399 686137400 686137401
686137402 686137403 686137404 686137405 686137406 686137407
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
pg_subtrans/28E5 doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686139330 686139331
686139332 686139333 686139334 686139335 686139336 686139337 686139338
686139339 686139340 686139341 686139342 686139343 686139344 686139345
686139346 686139347 686139348 686139349 686139350 686139351 686139352
686139353 686139354 686139355 686139356 686139357 686139358 686139359
686139360 686139361 686139362 686139363 686139364 686139365 686139366
686139367 686139368 686139369 686139370 686139371 686139372 686139373
686139374 686139375 686139376 686139377 686139378 686139379 686139380
686139381 686139382 686139383 686139384 686139385 686139386 686139387
686139388 686139389 686139390 686139391 686139392 686139393

[some more like this]

2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
pg_subtrans/28E6 doesn't exist, reading as zeroes
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686216055: subxacts: 686222447 686222448
686222449 686222450 686222451 686222452 686222453 686222454 686222455
686222456 686222457 686222459 686222460 686222461 686222462 686222463
686222464 686222502 686222561 686222647 686222722 686223272 686223359
686223360 686223361 686223363 686223364 686223365 686223366 686223367
686223368 686223369 686223370 686223371 686223372 686223373 686223374
686223375 686223376 686223377 686223378 686223379 686223380 686223381
686223382 686223383 686223384 686223385 686223386 686223387 686223388
686223389 686223390 686223391 686223392 686223393 686223394 686223395
686223396 686223397 686223398 686223399 686223400 686223401
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] FATAL:
could not access status of transaction 686225586
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] DETAIL:
Could not read from file pg_subtrans/28E6 at offset 253952: Success.
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686225585: subxacts: 686225586 686225587
686225588 686225589 686225590 686225591 686225592 686225593 686225594
686225595 686225596 686225597 686225598 686225599 686225600 686225601
686225602 686225603 686225604 686225605 686225606 686225607 686225608
686225609 686225610 686225611 686225612 686225613 686225614 686225615
686225616 686225617 686225621 686225622 686225625 686225626 686225628
686225632 686225633 686225636 686225637 686225638 686225639 686225640
686225641 686225644 686225645 686225646 686225649 686225650 686225657
686225658 686225661 686225662 686225665 686225666 686225670 686225671
686225672 686225673 686225678 686225679 686225684 686225685
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 2113) exited with exit code 1
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Andres Freund
On 2013-04-05 07:22:08 -0700, Sergey Konoplev wrote:
 On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund and...@2ndquadrant.com wrote:
  Cool. It would be very helpful if you could apply the patch and verify
  that it works, it has been written somewhat blindly. Also I am afraid
  that at least last time there was a second bug involved.
 
 Okay, I will try to talk to admins but I am afraid it could take long.

Ok.
 
  Could you show the log?
 
 2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
 system was shut down in recovery at 2013-04-05 17:18:02 MSK
 2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
 standby mode
 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
 starts at 25BD/907338F8
 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
 pg_subtrans/28E5 doesn't exist, reading as zeroes

Looks like it could be fixed by the patch. But that seems to imply that
you restarted not long before that? Could you check if theres a
different error before those?

Greetings,

Andres Freund

PS: The tander.ru addresses seem to bounce all mail I send them...

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Fri, Apr 5, 2013 at 7:33 AM, Andres Freund and...@2ndquadrant.com wrote:
 Looks like it could be fixed by the patch. But that seems to imply that
 you restarted not long before that? Could you check if theres a
 different error before those?

Yes it had happened straight after restart this time. There are no any
errors in logs before it.


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-02 Thread Andres Freund
On 2013-04-02 12:10:12 +0200, Andres Freund wrote:
 On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
  On 30 March 2013 17:21, Andres Freund and...@2ndquadrant.com wrote:
  
   So if the xid is later than latestObservedXid we extend subtrans one by
   one. So far so good. But we initialize it in
   ProcArrayApplyRecoveryInfo() when consistency is initially reached:
latestObservedXid = running-nextXid;
TransactionIdRetreat(latestObservedXid);
   Before that subtrans has initially been started up with:
   if (wasShutdown)
   oldestActiveXID = 
   PrescanPreparedTransactions(xids, nxids);
   else
   oldestActiveXID = 
   checkPoint.oldestActiveXid;
   ...
   StartupSUBTRANS(oldestActiveXID);
  
   That means its only initialized up to checkPoint.oldestActiveXid. As it
   can take some time till we reach consistency it seems rather plausible
   that there now will be a gap in initilized pages. From
   checkPoint.oldestActiveXid to running-nextXid if there are pages
   inbetween.
  
  That was an old bug.
  
  StartupSUBTRANS() now explicitly fills that gap. Are you saying it
  does that incorrectly? How?
 
 Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
 between the call to Startup* and the first call to ExtendSUBTRANS. The
 latter is only called *after* we reached STANDBY_INITIALIZED via
 ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
 checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
 running-nextXid - 1. There very well can be a gap inbetween.
 The window isn't terribly big but if you use subtransactions as heavily
 as Sergey seems to be it doesn't seem unlikely to hit it.
 
 Let me come up with a testcase and patch.

Developing a testcase was trivial, pgbench running the following function:
CREATE OR REPLACE FUNCTION recurse_and_assign_txid(level bigint DEFAULT 0)
RETURNS bigint
LANGUAGE plpgsql AS $b$
BEGIN
IF level  500 THEN
RETURN recurse_and_assign_txid(level + 1);
ELSE
-- assign xid in subtxn and parents
CREATE TEMPORARY TABLE foo();
DROP TABLE foo;
RETURN txid_current()::bigint;
END IF;
EXCEPTION WHEN others THEN
RAISE NOTICE 'unexpected';
END
$b$;

When now restarting a standby (so it restarts from another checkpoint) it
frequently crashed with various errors:
* pg_subtrans/xxx does not exist
* (warning) pg_subtrans page does not exist, assuming zero
* xid overwritten in SubTransSetParent

So I think my theory is correct.

The attached patch fixes this although I don't like the way it knowledge of the
point up to which StartupSUBTRANS zeroes pages is handled.

Makes sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From cbb4e4d30ba1df5a3d370226545efe4966719867 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 2 Apr 2013 20:20:00 +0200
Subject: [PATCH] Ensure that SUBTRANS is initalized gaplessly when starting
 up HS

---
 src/backend/access/transam/xlog.c   |3 ++
 src/backend/storage/ipc/procarray.c |   53 +++
 src/include/storage/procarray.h |1 +
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 07c68ad..4a9462e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5391,6 +5391,9 @@ StartupXLOG(void)
 oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
 
+			/* Tell procarray about the range of xids it has to deal with */
+			ProcArrayInitRecovery(ShmemVariableCache-nextXid);
+
 			/*
 			 * Startup commit log and subtrans only. Other SLRUs are not
 			 * maintained during recovery and need not be started yet.
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..53a5aa2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -470,6 +470,28 @@ ProcArrayClearTransaction(PGPROC *proc)
 }
 
 /*
+ * ProcArrayInitRecovery -- initialize recovery xid mgmt environment
+ *
+ * Remember up to where the startup process initialized the CLOG and subtrans
+ * so we can ensure its initialized gaplessly up to the point where necessary
+ * while in recovery.
+ */
+void
+ProcArrayInitRecovery(TransactionId initializedUptoXID)
+{
+	Assert(standbyState == STANDBY_INITIALIZED);
+	Assert(TransactionIdIsNormal(initializedUptoXID));
+
+	/*
+	 * we set latestObservedXid to the xid SUBTRANS has been initialized upto
+	 * so we can extend it from that point onwards when we reach a consistent
+	 * state in ProcArrayApplyRecoveryInfo().
+	 */
+	latestObservedXid = 

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-01 Thread Simon Riggs
On 30 March 2013 17:21, Andres Freund and...@2ndquadrant.com wrote:

 So if the xid is later than latestObservedXid we extend subtrans one by
 one. So far so good. But we initialize it in
 ProcArrayApplyRecoveryInfo() when consistency is initially reached:
  latestObservedXid = running-nextXid;
  TransactionIdRetreat(latestObservedXid);
 Before that subtrans has initially been started up with:
 if (wasShutdown)
 oldestActiveXID = 
 PrescanPreparedTransactions(xids, nxids);
 else
 oldestActiveXID = checkPoint.oldestActiveXid;
 ...
 StartupSUBTRANS(oldestActiveXID);

 That means its only initialized up to checkPoint.oldestActiveXid. As it
 can take some time till we reach consistency it seems rather plausible
 that there now will be a gap in initilized pages. From
 checkPoint.oldestActiveXid to running-nextXid if there are pages
 inbetween.

That was an old bug.

StartupSUBTRANS() now explicitly fills that gap. Are you saying it
does that incorrectly? How?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-30 Thread Andres Freund
On 2013-03-29 14:53:26 -0700, Sergey Konoplev wrote:
 On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund and...@2ndquadrant.com wrote:
  I have to admit, I find it a bit confusing that so many people report a
  bug and then immediately destroy all evidence of the bug. Just seems to
  a happen a bit too frequently.
 
 You see, businesses usually need it up ASAP again. Sorry, I must have
 note down the output of pg_controldata straight after it got broken, I
 just have not came up to it.

But the business will also need the standby working correctly in case
of a critical incident of a primary. So it should have quite an interest
in fixing bugs in that area. Yes, I realize thats not always easy to do
:(.

  Thats not a pg_controldata output from the broken replica though, or is
  it? I guess its from a new standby?
 
 That was the output from the standby that was rsync-ed on top of the
 broken one. I thought you might find something useful in it.

 Can I test your guess some other way? And what was the guess?

Don't think you can easily test it. And after reading more code I am
pretty sure my original guess was bogus. As was my second. And third ;)

But I think I see what could be going on:

During HS we maintain pg_subtrans so we can deal with more than
PGPROC_MAX_CACHED_SUBXIDS in one TX. For that we need to regularly
extend subtrans so the pages are initialized when we setup the
topxid-subxid mapping in ProcArrayApplyXidAssignment(). The call to
ExtendSUBTRANS happens in RecordKnownAssignedTransactionIds() which is
called from several places, including ProcArrayApplyXidAssignment().
The logic it uses is:
if (TransactionIdFollows(xid, latestObservedXid))
{
TransactionId next_expected_xid;

/*
 * Extend clog and subtrans like we do in GetNewTransactionId() 
during
 * normal operation using individual extend steps. Typical case
 * requires almost no activity.
 */
next_expected_xid = latestObservedXid;
TransactionIdAdvance(next_expected_xid);
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
{
ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);

TransactionIdAdvance(next_expected_xid);
}

So if the xid is later than latestObservedXid we extend subtrans one by
one. So far so good. But we initialize it in
ProcArrayApplyRecoveryInfo() when consistency is initially reached:
 latestObservedXid = running-nextXid;
 TransactionIdRetreat(latestObservedXid);
Before that subtrans has initially been started up with:
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(xids, nxids);
else
oldestActiveXID = checkPoint.oldestActiveXid;
...
StartupSUBTRANS(oldestActiveXID);

That means its only initialized up to checkPoint.oldestActiveXid. As it
can take some time till we reach consistency it seems rather plausible
that there now will be a gap in initilized pages. From
checkPoint.oldestActiveXid to running-nextXid if there are pages
inbetween.

Does that explanation sound about right to anybody else? I'll provide a
patch for the issue in a while, for now I'll try to reproduce it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-30 Thread ajmcello
unsubscribe


On Thu, Mar 28, 2013 at 10:30 PM, Sergey Konoplev gray...@gmail.com wrote:

 Hi all,

 A couple of days ago I found the replica stopped after the PANIC message:

 PANIC:  WAL contains references to invalid pages

 When I tried to restart it I got this FATAL:

 FATAL:  could not access status of transaction 280557568

 Below is the description of the server and information from PostgreSQL
 and system logs. After googling the problem I have found nothing like
 this.

 Any thoughts of what it could be and how to prevent it in the future?

 Hardware:

 IBM System x3650 M4, 148GB RAM, NAS

 Software:

 PostgreSQL 9.2.3, yum.postgresql.org
 CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

 Configuration:

 listen_addresses = '*'
 max_connections = 550
 shared_buffers = 35GB
 work_mem = 256MB
 maintenance_work_mem = 1GB
 bgwriter_delay = 10ms
 bgwriter_lru_multiplier = 10.0
 effective_io_concurrency = 32
 wal_level = hot_standby
 synchronous_commit = off
 checkpoint_segments = 1024
 checkpoint_timeout = 1h
 checkpoint_completion_target = 0.9
 checkpoint_warning = 5min
 max_wal_senders = 3
 wal_keep_segments = 2048
 hot_standby = on
 max_standby_streaming_delay = 5min
 hot_standby_feedback = on
 effective_cache_size = 133GB
 log_directory = '/var/log/pgsql'
 log_filename = 'postgresql-%Y-%m-%d.log'
 log_checkpoints = on
 log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
 log_lock_waits = on
 log_statement = 'ddl'
 log_timezone = 'W-SU'
 track_activity_query_size = 4096
 autovacuum_max_workers = 5
 autovacuum_naptime = 5s
 autovacuum_vacuum_scale_factor = 0.05
 autovacuum_analyze_scale_factor = 0.05
 autovacuum_vacuum_cost_delay = 5ms
 datestyle = 'iso, dmy'
 timezone = 'W-SU'
 lc_messages = 'en_US.UTF-8'
 lc_monetary = 'ru_RU.UTF-8'
 lc_numeric = 'ru_RU.UTF-8'
 lc_time = 'ru_RU.UTF-8'
 default_text_search_config = 'pg_catalog.russian'

 System:

 # Controls the maximum shared segment size, in bytes
 kernel.shmmax = 53287555072

 # Controls the maximum number of shared memory segments, in pages
 kernel.shmall = 13009657

 # Maximum number of file-handles
 fs.file-max = 65535

 # pdflush tuning to prevent lag spikes
 vm.dirty_ratio = 10
 vm.dirty_background_ratio = 1
 vm.dirty_expire_centisecs = 499

 # Prevent the scheduler breakdown
 kernel.sched_migration_cost = 500

 # Turned off to provide more CPU to PostgreSQL
 kernel.sched_autogroup_enabled = 0

 # Setup hugepages
 vm.hugetlb_shm_group = 26
 vm.hugepages_treat_as_movable = 0
 vm.nr_overcommit_hugepages = 512

 # The Huge Page Size is 2048kB, so for 35GB shared buffers the number is
 17920
 vm.nr_hugepages = 17920

 # Turn off the NUMA local pages reclaim as it leads to wrong caching
 strategy for databases
 vm.zone_reclaim_mode = 0

 Environment:

 HUGETLB_SHM=yes
 LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
 export HUGETLB_SHM LD_PRELOAD

 When it is stopped:

 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
 restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
 log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
 sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
 average=0.000 s
 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
 restart point at 2538/6E154AC0
 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
 completed transaction was at log time 2013-03-26 11:50:31.613948+04
 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
 restartpoint starting: xlog
 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
 page 451 of relation base/16436/2686702648 is uninitialized
 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
 xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
 lastBlockVacuumed 0
 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
 contains references to invalid pages
 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
 xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
 lastBlockVacuumed 0
 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
 process (PID 3773) was terminated by signal 6: Aborted
 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
 terminating any other active server processes

 From /var/log/messages:

 Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
 order:8, mode:0xd0
 Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
 2.6.32-279.22.1.el6.x86_64 #1
 Mar 26 10:50:52 tms2 kernel: : Call Trace:
 Mar 26 10:50:52 tms2 kernel: : [8112343f] ?
 __alloc_pages_nodemask+0x77f/0x940
 Mar 26 10:50:52 tms2 kernel: : [8115d3e2] ?
 kmem_getpages+0x62/0x170
 Mar 26 10:50:52 tms2 kernel: : [8115dffa] ?
 fallback_alloc+0x1ba/0x270
 Mar 26 10:50:52 tms2 kernel: : [8115da4f] ?
 cache_grow+0x2cf/0x320
 Mar 26 10:50:52 tms2 kernel: : [8115dd79] ?
 cache_alloc_node+0x99/0x160
 Mar 26 10:50:52 tms2 kernel: : [813fd455] ?
 

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread anara...@anarazel.de
Hi

Sergey Konoplev gray...@gmail.com schrieb:

Hi all,

A couple of days ago I found the replica stopped after the PANIC
message:

PANIC:  WAL contains references to invalid pages

When I tried to restart it I got this FATAL:

FATAL:  could not access status of transaction 280557568

Below is the description of the server and information from PostgreSQL
and system logs. After googling the problem I have found nothing like
this.

Any thoughts of what it could be and how to prevent it in the future?

I think I See whats going on. Do you still have the datadir available? If so, 
could you send the pg_controldata output?


Andres

Hardware:

IBM System x3650 M4, 148GB RAM, NAS

Software:

PostgreSQL 9.2.3, yum.postgresql.org
CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

Configuration:

listen_addresses = '*'
max_connections = 550
shared_buffers = 35GB
work_mem = 256MB
maintenance_work_mem = 1GB
bgwriter_delay = 10ms
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 32
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 1024
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 5min
max_wal_senders = 3
wal_keep_segments = 2048
hot_standby = on
max_standby_streaming_delay = 5min
hot_standby_feedback = on
effective_cache_size = 133GB
log_directory = '/var/log/pgsql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_checkpoints = on
log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'W-SU'
track_activity_query_size = 4096
autovacuum_max_workers = 5
autovacuum_naptime = 5s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 5ms
datestyle = 'iso, dmy'
timezone = 'W-SU'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'

System:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 53287555072

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 13009657

# Maximum number of file-handles
fs.file-max = 65535

# pdflush tuning to prevent lag spikes
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 499

# Prevent the scheduler breakdown
kernel.sched_migration_cost = 500

# Turned off to provide more CPU to PostgreSQL
kernel.sched_autogroup_enabled = 0

# Setup hugepages
vm.hugetlb_shm_group = 26
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 512

# The Huge Page Size is 2048kB, so for 35GB shared buffers the number
is 17920
vm.nr_hugepages = 17920

# Turn off the NUMA local pages reclaim as it leads to wrong caching
strategy for databases
vm.zone_reclaim_mode = 0

Environment:

HUGETLB_SHM=yes
LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
export HUGETLB_SHM LD_PRELOAD

When it is stopped:

2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
average=0.000 s
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
restart point at 2538/6E154AC0
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
completed transaction was at log time 2013-03-26 11:50:31.613948+04
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint starting: xlog
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
page 451 of relation base/16436/2686702648 is uninitialized
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 3773) was terminated by signal 6: Aborted
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

From /var/log/messages:

Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
order:8, mode:0xd0
Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
2.6.32-279.22.1.el6.x86_64 #1
Mar 26 10:50:52 tms2 kernel: : Call Trace:
Mar 26 10:50:52 tms2 kernel: : [8112343f] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 10:50:52 tms2 kernel: : [8115d3e2] ?
kmem_getpages+0x62/0x170
Mar 26 10:50:52 tms2 kernel: : [8115dffa] ?
fallback_alloc+0x1ba/0x270
Mar 26 10:50:52 tms2 kernel: : [8115da4f] ?
cache_grow+0x2cf/0x320
Mar 26 10:50:52 tms2 kernel: : [8115dd79] ?
cache_alloc_node+0x99/0x160
Mar 26 10:50:52 tms2 kernel: : [813fd455] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Sergey Konoplev
On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de
and...@anarazel.de wrote:
 I think I See whats going on. Do you still have the datadir available? If so, 
 could you send the pg_controldata output?

I have already rebuilt the replica, however below is the output if it is useful:

pg_control version number:922
Catalog version number:   201204301
Database system identifier:   5858109675396804534
Database cluster state:   in archive recovery
pg_control last modified: Сбт 30 Мар 2013 00:21:11
Latest checkpoint location:   258B/BDBBE748
Prior checkpoint location:258B/86DABCB8
Latest checkpoint's REDO location:258B/8B78BED0
Latest checkpoint's TimeLineID:   2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/423386899
Latest checkpoint's NextOID:  2758636912
Latest checkpoint's NextMultiXactId:  103920
Latest checkpoint's NextMultiOffset:  431309
Latest checkpoint's oldestXID:225634745
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  421766298
Time of latest checkpoint:Птн 29 Мар 2013 22:33:01
Minimum recovery ending location: 258C/14AA5FA0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:hot_standby
Current max_connections setting:  550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value



 Andres

Hardware:

IBM System x3650 M4, 148GB RAM, NAS

Software:

PostgreSQL 9.2.3, yum.postgresql.org
CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

Configuration:

listen_addresses = '*'
max_connections = 550
shared_buffers = 35GB
work_mem = 256MB
maintenance_work_mem = 1GB
bgwriter_delay = 10ms
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 32
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 1024
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 5min
max_wal_senders = 3
wal_keep_segments = 2048
hot_standby = on
max_standby_streaming_delay = 5min
hot_standby_feedback = on
effective_cache_size = 133GB
log_directory = '/var/log/pgsql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_checkpoints = on
log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'W-SU'
track_activity_query_size = 4096
autovacuum_max_workers = 5
autovacuum_naptime = 5s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 5ms
datestyle = 'iso, dmy'
timezone = 'W-SU'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'

System:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 53287555072

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 13009657

# Maximum number of file-handles
fs.file-max = 65535

# pdflush tuning to prevent lag spikes
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 499

# Prevent the scheduler breakdown
kernel.sched_migration_cost = 500

# Turned off to provide more CPU to PostgreSQL
kernel.sched_autogroup_enabled = 0

# Setup hugepages
vm.hugetlb_shm_group = 26
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 512

# The Huge Page Size is 2048kB, so for 35GB shared buffers the number
is 17920
vm.nr_hugepages = 17920

# Turn off the NUMA local pages reclaim as it leads to wrong caching
strategy for databases
vm.zone_reclaim_mode = 0

Environment:

HUGETLB_SHM=yes
LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
export HUGETLB_SHM LD_PRELOAD

When it is stopped:

2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
average=0.000 s
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
restart point at 2538/6E154AC0
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
completed transaction was at log time 2013-03-26 11:50:31.613948+04
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint starting: xlog
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
page 451 of relation 

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Andres Freund
On 2013-03-29 14:19:59 -0700, Sergey Konoplev wrote:
 On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de
 and...@anarazel.de wrote:
  I think I See whats going on. Do you still have the datadir available? If 
  so, could you send the pg_controldata output?
 
 I have already rebuilt the replica, however below is the output if it is 
 useful:

I have to admit, I find it a bit confusing that so many people report a
bug and then immediately destroy all evidence of the bug. Just seems to
a happen a bit too frequently.

 pg_control version number:922
 Catalog version number:   201204301
 Database system identifier:   5858109675396804534
 Database cluster state:   in archive recovery
 pg_control last modified: Сбт 30 Мар 2013 00:21:11
 Latest checkpoint location:   258B/BDBBE748
 Prior checkpoint location:258B/86DABCB8
 Latest checkpoint's REDO location:258B/8B78BED0
 Latest checkpoint's TimeLineID:   2
 Latest checkpoint's full_page_writes: on
 Latest checkpoint's NextXID:  0/423386899
 Latest checkpoint's NextOID:  2758636912
 Latest checkpoint's NextMultiXactId:  103920
 Latest checkpoint's NextMultiOffset:  431309
 Latest checkpoint's oldestXID:225634745
 Latest checkpoint's oldestXID's DB:   16436
 Latest checkpoint's oldestActiveXID:  421766298
 Time of latest checkpoint:Птн 29 Мар 2013 22:33:01

Thats not a pg_controldata output from the broken replica though, or is
it? I guess its from a new standby?

Andres
Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Sergey Konoplev
On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund and...@2ndquadrant.com wrote:
 I have to admit, I find it a bit confusing that so many people report a
 bug and then immediately destroy all evidence of the bug. Just seems to
 a happen a bit too frequently.

You see, businesses usually need it up ASAP again. Sorry, I must have
note down the output of pg_controldata straight after it got broken, I
just have not came up to it.

 Thats not a pg_controldata output from the broken replica though, or is
 it? I guess its from a new standby?

That was the output from the standby that was rsync-ed on top of the
broken one. I thought you might find something useful in it.

Can I test your guess some other way? And what was the guess?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-28 Thread Sergey Konoplev
Hi all,

A couple of days ago I found the replica stopped after the PANIC message:

PANIC:  WAL contains references to invalid pages

When I tried to restart it I got this FATAL:

FATAL:  could not access status of transaction 280557568

Below is the description of the server and information from PostgreSQL
and system logs. After googling the problem I have found nothing like
this.

Any thoughts of what it could be and how to prevent it in the future?

Hardware:

IBM System x3650 M4, 148GB RAM, NAS

Software:

PostgreSQL 9.2.3, yum.postgresql.org
CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

Configuration:

listen_addresses = '*'
max_connections = 550
shared_buffers = 35GB
work_mem = 256MB
maintenance_work_mem = 1GB
bgwriter_delay = 10ms
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 32
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 1024
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 5min
max_wal_senders = 3
wal_keep_segments = 2048
hot_standby = on
max_standby_streaming_delay = 5min
hot_standby_feedback = on
effective_cache_size = 133GB
log_directory = '/var/log/pgsql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_checkpoints = on
log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'W-SU'
track_activity_query_size = 4096
autovacuum_max_workers = 5
autovacuum_naptime = 5s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 5ms
datestyle = 'iso, dmy'
timezone = 'W-SU'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'

System:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 53287555072

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 13009657

# Maximum number of file-handles
fs.file-max = 65535

# pdflush tuning to prevent lag spikes
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 499

# Prevent the scheduler breakdown
kernel.sched_migration_cost = 500

# Turned off to provide more CPU to PostgreSQL
kernel.sched_autogroup_enabled = 0

# Setup hugepages
vm.hugetlb_shm_group = 26
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 512

# The Huge Page Size is 2048kB, so for 35GB shared buffers the number is 17920
vm.nr_hugepages = 17920

# Turn off the NUMA local pages reclaim as it leads to wrong caching
strategy for databases
vm.zone_reclaim_mode = 0

Environment:

HUGETLB_SHM=yes
LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
export HUGETLB_SHM LD_PRELOAD

When it is stopped:

2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
average=0.000 s
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
restart point at 2538/6E154AC0
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
completed transaction was at log time 2013-03-26 11:50:31.613948+04
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint starting: xlog
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
page 451 of relation base/16436/2686702648 is uninitialized
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 3773) was terminated by signal 6: Aborted
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

From /var/log/messages:

Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
order:8, mode:0xd0
Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
2.6.32-279.22.1.el6.x86_64 #1
Mar 26 10:50:52 tms2 kernel: : Call Trace:
Mar 26 10:50:52 tms2 kernel: : [8112343f] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 10:50:52 tms2 kernel: : [8115d3e2] ? kmem_getpages+0x62/0x170
Mar 26 10:50:52 tms2 kernel: : [8115dffa] ? fallback_alloc+0x1ba/0x270
Mar 26 10:50:52 tms2 kernel: : [8115da4f] ? cache_grow+0x2cf/0x320
Mar 26 10:50:52 tms2 kernel: : [8115dd79] ?
cache_alloc_node+0x99/0x160
Mar 26 10:50:52 tms2 kernel: : [813fd455] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: : [8115eb49] ? __kmalloc+0x189/0x220
Mar 26 10:50:52 tms2 kernel: : [813fd455] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: :