Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Sergey Konoplev
On Thu, Oct 10, 2013 at 2:48 PM, Alvaro Herrera
 wrote:
>> Could you please give me a hint of how to check if this patch was
>> included in 9.2.5 or not?
>
> Yes, this was committed in June:
>
> commit 99ee15b315c187045a95db7b27fd9d866aea93e0
> Author: Simon Riggs 
> Date:   Sun Jun 23 11:05:02 2013 +0100

Good news, thank you.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Alvaro Herrera
Sergey Konoplev escribió:
> On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane  wrote:
> > Sergey Konoplev  writes:
> >> Just curious, what is the planned date for the next minor release, and
> >> BTW where is it possible to see the roadmap for minor releases?
> >
> > There is no planned date, and certainly no "roadmap".  We make minor
> > releases when the core team judges that enough (or severe enough)
> > fixes have accumulated since the last time.  Historically we've averaged
> > about four minor releases a year, but that's not set in stone anywhere.
> 
> Could you please give me a hint of how to check if this patch was
> included in 9.2.5 or not?

Yes, this was committed in June:

commit 99ee15b315c187045a95db7b27fd9d866aea93e0
Author: Simon Riggs 
Date:   Sun Jun 23 11:05:02 2013 +0100

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-10-10 Thread Sergey Konoplev
On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane  wrote:
> Sergey Konoplev  writes:
>> Just curious, what is the planned date for the next minor release, and
>> BTW where is it possible to see the roadmap for minor releases?
>
> There is no planned date, and certainly no "roadmap".  We make minor
> releases when the core team judges that enough (or severe enough)
> fixes have accumulated since the last time.  Historically we've averaged
> about four minor releases a year, but that's not set in stone anywhere.

Could you please give me a hint of how to check if this patch was
included in 9.2.5 or not?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-08-19 Thread Sergey Konoplev
Hi all,

I recently noticed that I continue to receive "allocation failure"
messages from postmaster in /var/log/messages on the server where the
problem occurred.

Aug 17 23:00:51 tms2 kernel: : postmaster: page allocation failure.
order:5, mode:0xd0
Aug 17 23:00:51 tms2 kernel: : Pid: 21223, comm: postmaster Not
tainted 2.6.32-279.22.1.el6.x86_64 #1
Aug 17 23:00:51 tms2 kernel: : Call Trace:
Aug 17 23:00:51 tms2 kernel: : [] ?
__alloc_pages_nodemask+0x77f/0x940
Aug 17 23:00:51 tms2 kernel: : [] ? kmem_getpages+0x62/0x170
Aug 17 23:00:51 tms2 kernel: : [] ? fallback_alloc+0x1ba/0x270
Aug 17 23:00:51 tms2 kernel: : [] ? cache_grow+0x2cf/0x320
Aug 17 23:00:51 tms2 kernel: : [] ?
cache_alloc_node+0x99/0x160
Aug 17 23:00:51 tms2 kernel: : [] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [] ? __kmalloc+0x189/0x220
Aug 17 23:00:51 tms2 kernel: : [] ?
dma_pin_iovec_pages+0xb5/0x230
Aug 17 23:00:51 tms2 kernel: : [] ? lock_sock_nested+0xac/0xc0
Aug 17 23:00:51 tms2 kernel: : [] ? tcp_recvmsg+0x4ca/0xe80
Aug 17 23:00:51 tms2 kernel: : [] ? inet_recvmsg+0x5a/0x90
Aug 17 23:00:51 tms2 kernel: : [] ? sock_recvmsg+0x133/0x160
Aug 17 23:00:51 tms2 kernel: : [] ?
autoremove_wake_function+0x0/0x40
Aug 17 23:00:51 tms2 kernel: : [] ? do_sync_read+0xfa/0x140
Aug 17 23:00:51 tms2 kernel: : [] ? sys_recvfrom+0xee/0x180
Aug 17 23:00:51 tms2 kernel: : [] ?
poll_select_set_timeout+0x8d/0xa0
Aug 17 23:00:51 tms2 kernel: : [] ?
audit_syscall_entry+0x1d7/0x200
Aug 17 23:00:51 tms2 kernel: : [] ?
system_call_fastpath+0x16/0x1b

Except these messages everything is just fine. The server is a passive
(very rarely queried) hot standby.

Are there any ideas of why and what could it be?

Thank you.

On Tue, Jun 11, 2013 at 6:50 AM, Tom Lane  wrote:
> Sergey Konoplev  writes:
>> Just curious, what is the planned date for the next minor release, and
>> BTW where is it possible to see the roadmap for minor releases?
>
> There is no planned date, and certainly no "roadmap".  We make minor
> releases when the core team judges that enough (or severe enough)
> fixes have accumulated since the last time.  Historically we've averaged
> about four minor releases a year, but that's not set in stone anywhere.
>
> regards, tom lane



-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-11 Thread Tom Lane
Sergey Konoplev  writes:
> Just curious, what is the planned date for the next minor release, and
> BTW where is it possible to see the roadmap for minor releases?

There is no planned date, and certainly no "roadmap".  We make minor
releases when the core team judges that enough (or severe enough)
fixes have accumulated since the last time.  Historically we've averaged
about four minor releases a year, but that's not set in stone anywhere.

regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-11 Thread Sergey Konoplev
On Mon, Jun 10, 2013 at 11:43 PM, Simon Riggs  wrote:
> On 11 June 2013 04:36, Sergey Konoplev  wrote:
>> Are there any plans to commit this patch and what version it is going
>> to be done to?
>
> I'll be committing this soon, since we're likely coming up to the next
> point release soon.

I see, thank you.

Just curious, what is the planned date for the next minor release, and
BTW where is it possible to see the roadmap for minor releases?

> Thanks for the reminder.

You are welcome.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-10 Thread Simon Riggs
On 11 June 2013 04:36, Sergey Konoplev  wrote:
> Hi,
>
> On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev  wrote:
>> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund  
>> wrote:
>>> The attached patch fixes this although I don't like the way it knowledge of 
>>> the
>>> point up to which StartupSUBTRANS zeroes pages is handled.
>>
>> One month has passed since the patched version was installed in our
>> production environment and can confirm that everything works perfect.
>> Thank you very much for your prompt help, Andres.
>
> Are there any plans to commit this patch and what version it is going
> to be done to?
>
> Thank you.

I'll be committing this soon, since we're likely coming up to the next
point release soon.

Thanks for the reminder.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-06-10 Thread Sergey Konoplev
Hi,

On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev  wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund  wrote:
>> The attached patch fixes this although I don't like the way it knowledge of 
>> the
>> point up to which StartupSUBTRANS zeroes pages is handled.
>
> One month has passed since the patched version was installed in our
> production environment and can confirm that everything works perfect.
> Thank you very much for your prompt help, Andres.

Are there any plans to commit this patch and what version it is going
to be done to?

Thank you.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-05-09 Thread Sergey Konoplev
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund  wrote:
> The attached patch fixes this although I don't like the way it knowledge of 
> the
> point up to which StartupSUBTRANS zeroes pages is handled.

One month has passed since the patched version was installed in our
production environment and can confirm that everything works perfect.
Thank you very much for your prompt help, Andres.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Fri, Apr 5, 2013 at 7:33 AM, Andres Freund  wrote:
> Looks like it could be fixed by the patch. But that seems to imply that
> you restarted not long before that? Could you check if theres a
> different error before those?

Yes it had happened straight after restart this time. There are no any
errors in logs before it.


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Andres Freund
On 2013-04-05 07:22:08 -0700, Sergey Konoplev wrote:
> On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund  wrote:
> > Cool. It would be very helpful if you could apply the patch and verify
> > that it works, it has been written somewhat blindly. Also I am afraid
> > that at least last time there was a second bug involved.
> 
> Okay, I will try to talk to admins but I am afraid it could take long.

Ok.
 
> > Could you show the log?
> 
> 2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
> system was shut down in recovery at 2013-04-05 17:18:02 MSK
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
> standby mode
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
> starts at 25BD/907338F8
> 2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
> "pg_subtrans/28E5" doesn't exist, reading as zeroes

Looks like it could be fixed by the patch. But that seems to imply that
you restarted not long before that? Could you check if theres a
different error before those?

Greetings,

Andres Freund

PS: The tander.ru addresses seem to bounce all mail I send them...

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Fri, Apr 5, 2013 at 7:15 AM, Andres Freund  wrote:
> Cool. It would be very helpful if you could apply the patch and verify
> that it works, it has been written somewhat blindly. Also I am afraid
> that at least last time there was a second bug involved.

Okay, I will try to talk to admins but I am afraid it could take long.

> Could you show the log?

2013-04-05 17:26:31 MSK 2113 @ from  [vxid: txid:0] [] LOG:  database
system was shut down in recovery at 2013-04-05 17:18:02 MSK
2013-04-05 17:26:32 MSK 2113 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  redo
starts at 25BD/907338F8
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E5" doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686137344 686137345
686137346 686137347 686137348 686137349 686137350 686137351 686137352
686137353 686137354 686137355 686137356 686137357 686137358 686137359
686137360 686137361 686137362 686137363 686137364 686137365 686137366
686137367 686137368 686137369 686137370 686137371 686137372 686137373
686137374 686137375 686137376 686137377 686137378 686137379 686137380
686137381 686137382 686137383 686137384 686137385 686137386 686137387
686137388 686137389 686137390 686137391 686137392 686137393 686137394
686137395 686137396 686137397 686137398 686137399 686137400 686137401
686137402 686137403 686137404 686137405 686137406 686137407
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E5" doesn't exist, reading as zeroes
2013-04-05 17:26:32 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686136255: subxacts: 686139330 686139331
686139332 686139333 686139334 686139335 686139336 686139337 686139338
686139339 686139340 686139341 686139342 686139343 686139344 686139345
686139346 686139347 686139348 686139349 686139350 686139351 686139352
686139353 686139354 686139355 686139356 686139357 686139358 686139359
686139360 686139361 686139362 686139363 686139364 686139365 686139366
686139367 686139368 686139369 686139370 686139371 686139372 686139373
686139374 686139375 686139376 686139377 686139378 686139379 686139380
686139381 686139382 686139383 686139384 686139385 686139386 686139387
686139388 686139389 686139390 686139391 686139392 686139393

[some more like this]

2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] LOG:  file
"pg_subtrans/28E6" doesn't exist, reading as zeroes
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686216055: subxacts: 686222447 686222448
686222449 686222450 686222451 686222452 686222453 686222454 686222455
686222456 686222457 686222459 686222460 686222461 686222462 686222463
686222464 686222502 686222561 686222647 686222722 686223272 686223359
686223360 686223361 686223363 686223364 686223365 686223366 686223367
686223368 686223369 686223370 686223371 686223372 686223373 686223374
686223375 686223376 686223377 686223378 686223379 686223380 686223381
686223382 686223383 686223384 686223385 686223386 686223387 686223388
686223389 686223390 686223391 686223392 686223393 686223394 686223395
686223396 686223397 686223398 686223399 686223400 686223401
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] FATAL:
could not access status of transaction 686225586
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] DETAIL:
Could not read from file "pg_subtrans/28E6" at offset 253952: Success.
2013-04-05 17:26:36 MSK 2113 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo xid assignment xtop 686225585: subxacts: 686225586 686225587
686225588 686225589 686225590 686225591 686225592 686225593 686225594
686225595 686225596 686225597 686225598 686225599 686225600 686225601
686225602 686225603 686225604 686225605 686225606 686225607 686225608
686225609 686225610 686225611 686225612 686225613 686225614 686225615
686225616 686225617 686225621 686225622 686225625 686225626 686225628
686225632 686225633 686225636 686225637 686225638 686225639 686225640
686225641 686225644 686225645 686225646 686225649 686225650 686225657
686225658 686225661 686225662 686225665 686225666 686225670 686225671
686225672 686225673 686225678 686225679 686225684 686225685
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 2113) exited with exit code 1
2013-04-05 17:26:36 MSK 2110 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes


--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Andres Freund
On 2013-04-05 07:10:12 -0700, Sergey Konoplev wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund  wrote:
> > The attached patch fixes this although I don't like the way it knowledge of 
> > the
> > point up to which StartupSUBTRANS zeroes pages is handled.
> 
> Thank you for the patch, Andres.
> 
> Is it included in 9.2.4?

No. Too late for that. It hasn't bee committed yet.

> BTW, it has happened again and I am going to make a copy of the
> cluster to be able to provide you some extra information. Do you still
> need it?

Cool. It would be very helpful if you could apply the patch and verify
that it works, it has been written somewhat blindly. Also I am afraid
that at least last time there was a second bug involved.

Could you show the log?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-05 Thread Sergey Konoplev
On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund  wrote:
> The attached patch fixes this although I don't like the way it knowledge of 
> the
> point up to which StartupSUBTRANS zeroes pages is handled.

Thank you for the patch, Andres.

Is it included in 9.2.4?

BTW, it has happened again and I am going to make a copy of the
cluster to be able to provide you some extra information. Do you still
need it?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-02 Thread Andres Freund
On 2013-04-02 12:10:12 +0200, Andres Freund wrote:
> On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> > On 30 March 2013 17:21, Andres Freund  wrote:
> > 
> > > So if the xid is later than latestObservedXid we extend subtrans one by
> > > one. So far so good. But we initialize it in
> > > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> > >  latestObservedXid = running->nextXid;
> > >  TransactionIdRetreat(latestObservedXid);
> > > Before that subtrans has initially been started up with:
> > > if (wasShutdown)
> > > oldestActiveXID = 
> > > PrescanPreparedTransactions(&xids, &nxids);
> > > else
> > > oldestActiveXID = 
> > > checkPoint.oldestActiveXid;
> > > ...
> > > StartupSUBTRANS(oldestActiveXID);
> > >
> > > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > > can take some time till we reach consistency it seems rather plausible
> > > that there now will be a gap in initilized pages. From
> > > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > > inbetween.
> > 
> > That was an old bug.
> > 
> > StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> > does that incorrectly? How?
> 
> Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
> between the call to Startup* and the first call to ExtendSUBTRANS. The
> latter is only called *after* we reached STANDBY_INITIALIZED via
> ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
> checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
> running->nextXid - 1. There very well can be a gap inbetween.
> The window isn't terribly big but if you use subtransactions as heavily
> as Sergey seems to be it doesn't seem unlikely to hit it.
> 
> Let me come up with a testcase and patch.

Developing a testcase was trivial, pgbench running the following function:
CREATE OR REPLACE FUNCTION recurse_and_assign_txid(level bigint DEFAULT 0)
RETURNS bigint
LANGUAGE plpgsql AS $b$
BEGIN
IF level < 500 THEN
RETURN recurse_and_assign_txid(level + 1);
ELSE
-- assign xid in subtxn and parents
CREATE TEMPORARY TABLE foo();
DROP TABLE foo;
RETURN txid_current()::bigint;
END IF;
EXCEPTION WHEN others THEN
RAISE NOTICE 'unexpected';
END
$b$;

When now restarting a standby (so it restarts from another checkpoint) it
frequently crashed with various errors:
* pg_subtrans/xxx does not exist
* (warning) pg_subtrans page does not exist, assuming zero
* xid overwritten in SubTransSetParent

So I think my theory is correct.

The attached patch fixes this although I don't like the way it knowledge of the
point up to which StartupSUBTRANS zeroes pages is handled.

Makes sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From cbb4e4d30ba1df5a3d370226545efe4966719867 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 2 Apr 2013 20:20:00 +0200
Subject: [PATCH] Ensure that SUBTRANS is initalized gaplessly when starting
 up HS

---
 src/backend/access/transam/xlog.c   |3 ++
 src/backend/storage/ipc/procarray.c |   53 +++
 src/include/storage/procarray.h |1 +
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 07c68ad..4a9462e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5391,6 +5391,9 @@ StartupXLOG(void)
 oldestActiveXID = checkPoint.oldestActiveXid;
 			Assert(TransactionIdIsValid(oldestActiveXID));
 
+			/* Tell procarray about the range of xids it has to deal with */
+			ProcArrayInitRecovery(ShmemVariableCache->nextXid);
+
 			/*
 			 * Startup commit log and subtrans only. Other SLRUs are not
 			 * maintained during recovery and need not be started yet.
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..53a5aa2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -470,6 +470,28 @@ ProcArrayClearTransaction(PGPROC *proc)
 }
 
 /*
+ * ProcArrayInitRecovery -- initialize recovery xid mgmt environment
+ *
+ * Remember up to where the startup process initialized the CLOG and subtrans
+ * so we can ensure its initialized gaplessly up to the point where necessary
+ * while in recovery.
+ */
+void
+ProcArrayInitRecovery(TransactionId initializedUptoXID)
+{
+	Assert(standbyState == STANDBY_INITIALIZED);
+	Assert(TransactionIdIsNormal(initializedUptoXID));
+
+	/*
+	 * we set latestObservedXid to the xid SUBTRANS has been initialized upto
+	 * so we can extend it from that point onwards when we reach a consistent
+	 * state in ProcArrayApplyRe

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-02 Thread Andres Freund
On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> On 30 March 2013 17:21, Andres Freund  wrote:
> 
> > So if the xid is later than latestObservedXid we extend subtrans one by
> > one. So far so good. But we initialize it in
> > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> >  latestObservedXid = running->nextXid;
> >  TransactionIdRetreat(latestObservedXid);
> > Before that subtrans has initially been started up with:
> > if (wasShutdown)
> > oldestActiveXID = 
> > PrescanPreparedTransactions(&xids, &nxids);
> > else
> > oldestActiveXID = 
> > checkPoint.oldestActiveXid;
> > ...
> > StartupSUBTRANS(oldestActiveXID);
> >
> > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > can take some time till we reach consistency it seems rather plausible
> > that there now will be a gap in initilized pages. From
> > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > inbetween.
> 
> That was an old bug.
> 
> StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> does that incorrectly? How?

Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
between the call to Startup* and the first call to ExtendSUBTRANS. The
latter is only called *after* we reached STANDBY_INITIALIZED via
ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
running->nextXid - 1. There very well can be a gap inbetween.
The window isn't terribly big but if you use subtransactions as heavily
as Sergey seems to be it doesn't seem unlikely to hit it.

Let me come up with a testcase and patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-04-01 Thread Simon Riggs
On 30 March 2013 17:21, Andres Freund  wrote:

> So if the xid is later than latestObservedXid we extend subtrans one by
> one. So far so good. But we initialize it in
> ProcArrayApplyRecoveryInfo() when consistency is initially reached:
>  latestObservedXid = running->nextXid;
>  TransactionIdRetreat(latestObservedXid);
> Before that subtrans has initially been started up with:
> if (wasShutdown)
> oldestActiveXID = 
> PrescanPreparedTransactions(&xids, &nxids);
> else
> oldestActiveXID = checkPoint.oldestActiveXid;
> ...
> StartupSUBTRANS(oldestActiveXID);
>
> That means its only initialized up to checkPoint.oldestActiveXid. As it
> can take some time till we reach consistency it seems rather plausible
> that there now will be a gap in initilized pages. From
> checkPoint.oldestActiveXid to running->nextXid if there are pages
> inbetween.

That was an old bug.

StartupSUBTRANS() now explicitly fills that gap. Are you saying it
does that incorrectly? How?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-30 Thread ajmcello
unsubscribe


On Thu, Mar 28, 2013 at 10:30 PM, Sergey Konoplev  wrote:

> Hi all,
>
> A couple of days ago I found the replica stopped after the PANIC message:
>
> PANIC:  WAL contains references to invalid pages
>
> When I tried to restart it I got this FATAL:
>
> FATAL:  could not access status of transaction 280557568
>
> Below is the description of the server and information from PostgreSQL
> and system logs. After googling the problem I have found nothing like
> this.
>
> Any thoughts of what it could be and how to prevent it in the future?
>
> Hardware:
>
> IBM System x3650 M4, 148GB RAM, NAS
>
> Software:
>
> PostgreSQL 9.2.3, yum.postgresql.org
> CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>
> Configuration:
>
> listen_addresses = '*'
> max_connections = 550
> shared_buffers = 35GB
> work_mem = 256MB
> maintenance_work_mem = 1GB
> bgwriter_delay = 10ms
> bgwriter_lru_multiplier = 10.0
> effective_io_concurrency = 32
> wal_level = hot_standby
> synchronous_commit = off
> checkpoint_segments = 1024
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9
> checkpoint_warning = 5min
> max_wal_senders = 3
> wal_keep_segments = 2048
> hot_standby = on
> max_standby_streaming_delay = 5min
> hot_standby_feedback = on
> effective_cache_size = 133GB
> log_directory = '/var/log/pgsql'
> log_filename = 'postgresql-%Y-%m-%d.log'
> log_checkpoints = on
> log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
> log_lock_waits = on
> log_statement = 'ddl'
> log_timezone = 'W-SU'
> track_activity_query_size = 4096
> autovacuum_max_workers = 5
> autovacuum_naptime = 5s
> autovacuum_vacuum_scale_factor = 0.05
> autovacuum_analyze_scale_factor = 0.05
> autovacuum_vacuum_cost_delay = 5ms
> datestyle = 'iso, dmy'
> timezone = 'W-SU'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'ru_RU.UTF-8'
> lc_numeric = 'ru_RU.UTF-8'
> lc_time = 'ru_RU.UTF-8'
> default_text_search_config = 'pg_catalog.russian'
>
> System:
>
> # Controls the maximum shared segment size, in bytes
> kernel.shmmax = 53287555072
>
> # Controls the maximum number of shared memory segments, in pages
> kernel.shmall = 13009657
>
> # Maximum number of file-handles
> fs.file-max = 65535
>
> # pdflush tuning to prevent lag spikes
> vm.dirty_ratio = 10
> vm.dirty_background_ratio = 1
> vm.dirty_expire_centisecs = 499
>
> # Prevent the scheduler breakdown
> kernel.sched_migration_cost = 500
>
> # Turned off to provide more CPU to PostgreSQL
> kernel.sched_autogroup_enabled = 0
>
> # Setup hugepages
> vm.hugetlb_shm_group = 26
> vm.hugepages_treat_as_movable = 0
> vm.nr_overcommit_hugepages = 512
>
> # The Huge Page Size is 2048kB, so for 35GB shared buffers the number is
> 17920
> vm.nr_hugepages = 17920
>
> # Turn off the NUMA local pages reclaim as it leads to wrong caching
> strategy for databases
> vm.zone_reclaim_mode = 0
>
> Environment:
>
> HUGETLB_SHM=yes
> LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
> export HUGETLB_SHM LD_PRELOAD
>
> When it is stopped:
>
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
> restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
> log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
> sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
> average=0.000 s
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
> restart point at 2538/6E154AC0
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
> completed transaction was at log time 2013-03-26 11:50:31.613948+04
> 2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
> restartpoint starting: xlog
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
> page 451 of relation base/16436/2686702648 is uninitialized
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
> lastBlockVacuumed 0
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
> contains references to invalid pages
> 2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
> lastBlockVacuumed 0
> 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
> process (PID 3773) was terminated by signal 6: Aborted
> 2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
> terminating any other active server processes
>
> From /var/log/messages:
>
> Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
> order:8, mode:0xd0
> Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
> 2.6.32-279.22.1.el6.x86_64 #1
> Mar 26 10:50:52 tms2 kernel: : Call Trace:
> Mar 26 10:50:52 tms2 kernel: : [] ?
> __alloc_pages_nodemask+0x77f/0x940
> Mar 26 10:50:52 tms2 kernel: : [] ?
> kmem_getpages+0x62/0x170
> Mar 26 10:50:52 tms2 kernel: : [] ?
> fallback_alloc+0x1ba/0x270
> Mar 26 10:50:52 tms2 kernel: : [] ?
> cache_grow+0x2cf/0x320
> Mar 26 10:50:52 tms2 kernel: : [] ?
> cache_alloc_node+0x99/0x160
> Mar 26 

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-30 Thread Andres Freund
On 2013-03-29 14:53:26 -0700, Sergey Konoplev wrote:
> On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund  wrote:
> > I have to admit, I find it a bit confusing that so many people report a
> > bug and then immediately destroy all evidence of the bug. Just seems to
> > a happen a bit too frequently.
> 
> You see, businesses usually need it up ASAP again. Sorry, I must have
> note down the output of pg_controldata straight after it got broken, I
> just have not came up to it.

But the business will also need the standby working correctly in case
of a critical incident of a primary. So it should have quite an interest
in fixing bugs in that area. Yes, I realize thats not always easy to do
:(.

> > Thats not a pg_controldata output from the broken replica though, or is
> > it? I guess its from a new standby?
> 
> That was the output from the standby that was rsync-ed on top of the
> broken one. I thought you might find something useful in it.

> Can I test your guess some other way? And what was the guess?

Don't think you can easily test it. And after reading more code I am
pretty sure my original guess was bogus. As was my second. And third ;)

But I think I see what could be going on:

During HS we maintain pg_subtrans so we can deal with more than
PGPROC_MAX_CACHED_SUBXIDS in one TX. For that we need to regularly
extend subtrans so the pages are initialized when we setup the
topxid<->subxid mapping in ProcArrayApplyXidAssignment(). The call to
ExtendSUBTRANS happens in RecordKnownAssignedTransactionIds() which is
called from several places, including ProcArrayApplyXidAssignment().
The logic it uses is:
if (TransactionIdFollows(xid, latestObservedXid))
{
TransactionId next_expected_xid;

/*
 * Extend clog and subtrans like we do in GetNewTransactionId() 
during
 * normal operation using individual extend steps. Typical case
 * requires almost no activity.
 */
next_expected_xid = latestObservedXid;
TransactionIdAdvance(next_expected_xid);
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
{
ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);

TransactionIdAdvance(next_expected_xid);
}

So if the xid is later than latestObservedXid we extend subtrans one by
one. So far so good. But we initialize it in
ProcArrayApplyRecoveryInfo() when consistency is initially reached:
 latestObservedXid = running->nextXid;
 TransactionIdRetreat(latestObservedXid);
Before that subtrans has initially been started up with:
if (wasShutdown)
oldestActiveXID = 
PrescanPreparedTransactions(&xids, &nxids);
else
oldestActiveXID = checkPoint.oldestActiveXid;
...
StartupSUBTRANS(oldestActiveXID);

That means its only initialized up to checkPoint.oldestActiveXid. As it
can take some time till we reach consistency it seems rather plausible
that there now will be a gap in initilized pages. From
checkPoint.oldestActiveXid to running->nextXid if there are pages
inbetween.

Does that explanation sound about right to anybody else? I'll provide a
patch for the issue in a while, for now I'll try to reproduce it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Sergey Konoplev
On Fri, Mar 29, 2013 at 2:38 PM, Andres Freund  wrote:
> I have to admit, I find it a bit confusing that so many people report a
> bug and then immediately destroy all evidence of the bug. Just seems to
> a happen a bit too frequently.

You see, businesses usually need it up ASAP again. Sorry, I must have
note down the output of pg_controldata straight after it got broken, I
just have not came up to it.

> Thats not a pg_controldata output from the broken replica though, or is
> it? I guess its from a new standby?

That was the output from the standby that was rsync-ed on top of the
broken one. I thought you might find something useful in it.

Can I test your guess some other way? And what was the guess?

--
Kind regards,
Sergey Konoplev
Database and Software Consultant

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Andres Freund
On 2013-03-29 14:19:59 -0700, Sergey Konoplev wrote:
> On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de
>  wrote:
> > I think I See whats going on. Do you still have the datadir available? If 
> > so, could you send the pg_controldata output?
> 
> I have already rebuilt the replica, however below is the output if it is 
> useful:

I have to admit, I find it a bit confusing that so many people report a
bug and then immediately destroy all evidence of the bug. Just seems to
a happen a bit too frequently.

> pg_control version number:922
> Catalog version number:   201204301
> Database system identifier:   5858109675396804534
> Database cluster state:   in archive recovery
> pg_control last modified: Сбт 30 Мар 2013 00:21:11
> Latest checkpoint location:   258B/BDBBE748
> Prior checkpoint location:258B/86DABCB8
> Latest checkpoint's REDO location:258B/8B78BED0
> Latest checkpoint's TimeLineID:   2
> Latest checkpoint's full_page_writes: on
> Latest checkpoint's NextXID:  0/423386899
> Latest checkpoint's NextOID:  2758636912
> Latest checkpoint's NextMultiXactId:  103920
> Latest checkpoint's NextMultiOffset:  431309
> Latest checkpoint's oldestXID:225634745
> Latest checkpoint's oldestXID's DB:   16436
> Latest checkpoint's oldestActiveXID:  421766298
> Time of latest checkpoint:Птн 29 Мар 2013 22:33:01

Thats not a pg_controldata output from the broken replica though, or is
it? I guess its from a new standby?

Andres
Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread Sergey Konoplev
On Fri, Mar 29, 2013 at 1:52 PM, anara...@anarazel.de
 wrote:
> I think I See whats going on. Do you still have the datadir available? If so, 
> could you send the pg_controldata output?

I have already rebuilt the replica, however below is the output if it is useful:

pg_control version number:922
Catalog version number:   201204301
Database system identifier:   5858109675396804534
Database cluster state:   in archive recovery
pg_control last modified: Сбт 30 Мар 2013 00:21:11
Latest checkpoint location:   258B/BDBBE748
Prior checkpoint location:258B/86DABCB8
Latest checkpoint's REDO location:258B/8B78BED0
Latest checkpoint's TimeLineID:   2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/423386899
Latest checkpoint's NextOID:  2758636912
Latest checkpoint's NextMultiXactId:  103920
Latest checkpoint's NextMultiOffset:  431309
Latest checkpoint's oldestXID:225634745
Latest checkpoint's oldestXID's DB:   16436
Latest checkpoint's oldestActiveXID:  421766298
Time of latest checkpoint:Птн 29 Мар 2013 22:33:01
Minimum recovery ending location: 258C/14AA5FA0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:hot_standby
Current max_connections setting:  550
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

>
>
> Andres
>
>>Hardware:
>>
>>IBM System x3650 M4, 148GB RAM, NAS
>>
>>Software:
>>
>>PostgreSQL 9.2.3, yum.postgresql.org
>>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>>
>>Configuration:
>>
>>listen_addresses = '*'
>>max_connections = 550
>>shared_buffers = 35GB
>>work_mem = 256MB
>>maintenance_work_mem = 1GB
>>bgwriter_delay = 10ms
>>bgwriter_lru_multiplier = 10.0
>>effective_io_concurrency = 32
>>wal_level = hot_standby
>>synchronous_commit = off
>>checkpoint_segments = 1024
>>checkpoint_timeout = 1h
>>checkpoint_completion_target = 0.9
>>checkpoint_warning = 5min
>>max_wal_senders = 3
>>wal_keep_segments = 2048
>>hot_standby = on
>>max_standby_streaming_delay = 5min
>>hot_standby_feedback = on
>>effective_cache_size = 133GB
>>log_directory = '/var/log/pgsql'
>>log_filename = 'postgresql-%Y-%m-%d.log'
>>log_checkpoints = on
>>log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
>>log_lock_waits = on
>>log_statement = 'ddl'
>>log_timezone = 'W-SU'
>>track_activity_query_size = 4096
>>autovacuum_max_workers = 5
>>autovacuum_naptime = 5s
>>autovacuum_vacuum_scale_factor = 0.05
>>autovacuum_analyze_scale_factor = 0.05
>>autovacuum_vacuum_cost_delay = 5ms
>>datestyle = 'iso, dmy'
>>timezone = 'W-SU'
>>lc_messages = 'en_US.UTF-8'
>>lc_monetary = 'ru_RU.UTF-8'
>>lc_numeric = 'ru_RU.UTF-8'
>>lc_time = 'ru_RU.UTF-8'
>>default_text_search_config = 'pg_catalog.russian'
>>
>>System:
>>
>># Controls the maximum shared segment size, in bytes
>>kernel.shmmax = 53287555072
>>
>># Controls the maximum number of shared memory segments, in pages
>>kernel.shmall = 13009657
>>
>># Maximum number of file-handles
>>fs.file-max = 65535
>>
>># pdflush tuning to prevent lag spikes
>>vm.dirty_ratio = 10
>>vm.dirty_background_ratio = 1
>>vm.dirty_expire_centisecs = 499
>>
>># Prevent the scheduler breakdown
>>kernel.sched_migration_cost = 500
>>
>># Turned off to provide more CPU to PostgreSQL
>>kernel.sched_autogroup_enabled = 0
>>
>># Setup hugepages
>>vm.hugetlb_shm_group = 26
>>vm.hugepages_treat_as_movable = 0
>>vm.nr_overcommit_hugepages = 512
>>
>># The Huge Page Size is 2048kB, so for 35GB shared buffers the number
>>is 17920
>>vm.nr_hugepages = 17920
>>
>># Turn off the NUMA local pages reclaim as it leads to wrong caching
>>strategy for databases
>>vm.zone_reclaim_mode = 0
>>
>>Environment:
>>
>>HUGETLB_SHM=yes
>>LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
>>export HUGETLB_SHM LD_PRELOAD
>>
>>When it is stopped:
>>
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
>>log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
>>sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
>>average=0.000 s
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
>>restart point at 2538/6E154AC0
>>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
>>completed transaction was at log time 2013-03-26 11:50:31.613948+04
>

Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread anara...@anarazel.de
Hi

Sergey Konoplev  schrieb:

>Hi all,
>
>A couple of days ago I found the replica stopped after the PANIC
>message:
>
>PANIC:  WAL contains references to invalid pages
>
>When I tried to restart it I got this FATAL:
>
>FATAL:  could not access status of transaction 280557568
>
>Below is the description of the server and information from PostgreSQL
>and system logs. After googling the problem I have found nothing like
>this.
>
>Any thoughts of what it could be and how to prevent it in the future?

I think I See whats going on. Do you still have the datadir available? If so, 
could you send the pg_controldata output?


Andres

>Hardware:
>
>IBM System x3650 M4, 148GB RAM, NAS
>
>Software:
>
>PostgreSQL 9.2.3, yum.postgresql.org
>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>
>Configuration:
>
>listen_addresses = '*'
>max_connections = 550
>shared_buffers = 35GB
>work_mem = 256MB
>maintenance_work_mem = 1GB
>bgwriter_delay = 10ms
>bgwriter_lru_multiplier = 10.0
>effective_io_concurrency = 32
>wal_level = hot_standby
>synchronous_commit = off
>checkpoint_segments = 1024
>checkpoint_timeout = 1h
>checkpoint_completion_target = 0.9
>checkpoint_warning = 5min
>max_wal_senders = 3
>wal_keep_segments = 2048
>hot_standby = on
>max_standby_streaming_delay = 5min
>hot_standby_feedback = on
>effective_cache_size = 133GB
>log_directory = '/var/log/pgsql'
>log_filename = 'postgresql-%Y-%m-%d.log'
>log_checkpoints = on
>log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
>log_lock_waits = on
>log_statement = 'ddl'
>log_timezone = 'W-SU'
>track_activity_query_size = 4096
>autovacuum_max_workers = 5
>autovacuum_naptime = 5s
>autovacuum_vacuum_scale_factor = 0.05
>autovacuum_analyze_scale_factor = 0.05
>autovacuum_vacuum_cost_delay = 5ms
>datestyle = 'iso, dmy'
>timezone = 'W-SU'
>lc_messages = 'en_US.UTF-8'
>lc_monetary = 'ru_RU.UTF-8'
>lc_numeric = 'ru_RU.UTF-8'
>lc_time = 'ru_RU.UTF-8'
>default_text_search_config = 'pg_catalog.russian'
>
>System:
>
># Controls the maximum shared segment size, in bytes
>kernel.shmmax = 53287555072
>
># Controls the maximum number of shared memory segments, in pages
>kernel.shmall = 13009657
>
># Maximum number of file-handles
>fs.file-max = 65535
>
># pdflush tuning to prevent lag spikes
>vm.dirty_ratio = 10
>vm.dirty_background_ratio = 1
>vm.dirty_expire_centisecs = 499
>
># Prevent the scheduler breakdown
>kernel.sched_migration_cost = 500
>
># Turned off to provide more CPU to PostgreSQL
>kernel.sched_autogroup_enabled = 0
>
># Setup hugepages
>vm.hugetlb_shm_group = 26
>vm.hugepages_treat_as_movable = 0
>vm.nr_overcommit_hugepages = 512
>
># The Huge Page Size is 2048kB, so for 35GB shared buffers the number
>is 17920
>vm.nr_hugepages = 17920
>
># Turn off the NUMA local pages reclaim as it leads to wrong caching
>strategy for databases
>vm.zone_reclaim_mode = 0
>
>Environment:
>
>HUGETLB_SHM=yes
>LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
>export HUGETLB_SHM LD_PRELOAD
>
>When it is stopped:
>
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
>log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
>sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
>average=0.000 s
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
>restart point at 2538/6E154AC0
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
>completed transaction was at log time 2013-03-26 11:50:31.613948+04
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint starting: xlog
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
>page 451 of relation base/16436/2686702648 is uninitialized
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
>contains references to invalid pages
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
>process (PID 3773) was terminated by signal 6: Aborted
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
>terminating any other active server processes
>
>From /var/log/messages:
>
>Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
>order:8, mode:0xd0
>Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
>2.6.32-279.22.1.el6.x86_64 #1
>Mar 26 10:50:52 tms2 kernel: : Call Trace:
>Mar 26 10:50:52 tms2 kernel: : [] ?
>__alloc_pages_nodemask+0x77f/0x940
>Mar 26 10:50:52 tms2 kernel: : [] ?
>kmem_getpages+0x62/0x170
>Mar 26 10:50:52 tms2 kernel: : [] ?
>fallback_alloc+0x1ba/0x270
>Mar 26 10:50:52 tms2 kernel: : [] ?
>cache_grow+0x2cf/0x320
>Mar 26 10:50:52 tms2 kernel: : [] ?
>cache_alloc_node+0x99/0x160
>Mar 26 10:50:52 tms2 kernel: : [] ?
>dma_pin_

[BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-28 Thread Sergey Konoplev
Hi all,

A couple of days ago I found the replica stopped after the PANIC message:

PANIC:  WAL contains references to invalid pages

When I tried to restart it I got this FATAL:

FATAL:  could not access status of transaction 280557568

Below is the description of the server and information from PostgreSQL
and system logs. After googling the problem I have found nothing like
this.

Any thoughts of what it could be and how to prevent it in the future?

Hardware:

IBM System x3650 M4, 148GB RAM, NAS

Software:

PostgreSQL 9.2.3, yum.postgresql.org
CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64

Configuration:

listen_addresses = '*'
max_connections = 550
shared_buffers = 35GB
work_mem = 256MB
maintenance_work_mem = 1GB
bgwriter_delay = 10ms
bgwriter_lru_multiplier = 10.0
effective_io_concurrency = 32
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 1024
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 5min
max_wal_senders = 3
wal_keep_segments = 2048
hot_standby = on
max_standby_streaming_delay = 5min
hot_standby_feedback = on
effective_cache_size = 133GB
log_directory = '/var/log/pgsql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_checkpoints = on
log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'W-SU'
track_activity_query_size = 4096
autovacuum_max_workers = 5
autovacuum_naptime = 5s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.05
autovacuum_vacuum_cost_delay = 5ms
datestyle = 'iso, dmy'
timezone = 'W-SU'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'

System:

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 53287555072

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 13009657

# Maximum number of file-handles
fs.file-max = 65535

# pdflush tuning to prevent lag spikes
vm.dirty_ratio = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 499

# Prevent the scheduler breakdown
kernel.sched_migration_cost = 500

# Turned off to provide more CPU to PostgreSQL
kernel.sched_autogroup_enabled = 0

# Setup hugepages
vm.hugetlb_shm_group = 26
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 512

# The Huge Page Size is 2048kB, so for 35GB shared buffers the number is 17920
vm.nr_hugepages = 17920

# Turn off the NUMA local pages reclaim as it leads to wrong caching
strategy for databases
vm.zone_reclaim_mode = 0

Environment:

HUGETLB_SHM=yes
LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
export HUGETLB_SHM LD_PRELOAD

When it is stopped:

2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
average=0.000 s
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
restart point at 2538/6E154AC0
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
completed transaction was at log time 2013-03-26 11:50:31.613948+04
2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
restartpoint starting: xlog
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
page 451 of relation base/16436/2686702648 is uninitialized
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
contains references to invalid pages
2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
lastBlockVacuumed 0
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 3773) was terminated by signal 6: Aborted
2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
terminating any other active server processes

>From /var/log/messages:

Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
order:8, mode:0xd0
Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
2.6.32-279.22.1.el6.x86_64 #1
Mar 26 10:50:52 tms2 kernel: : Call Trace:
Mar 26 10:50:52 tms2 kernel: : [] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 10:50:52 tms2 kernel: : [] ? kmem_getpages+0x62/0x170
Mar 26 10:50:52 tms2 kernel: : [] ? fallback_alloc+0x1ba/0x270
Mar 26 10:50:52 tms2 kernel: : [] ? cache_grow+0x2cf/0x320
Mar 26 10:50:52 tms2 kernel: : [] ?
cache_alloc_node+0x99/0x160
Mar 26 10:50:52 tms2 kernel: : [] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: : [] ? __kmalloc+0x189/0x220
Mar 26 10:50:52 tms2 kernel: : [] ?
dma_pin_iovec_pages+0xb5/0x230
Mar 26 10:50:52 tms2 kernel: : [] ? lock_sock_nested+0xac/0xc0
Mar 26 10:50:52 tms2 kernel: : [] ? tcp_recvmsg+0x4ca/0xe80
Mar 26 10:50:52 tms2 kernel: : [] ?
xfs_vm_wri