On Sat, Apr 23, 2011 at 9:46 PM, Jaime Casanova <[email protected]> wrote:
> On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas <[email protected]> wrote:
>>
>> That is, a standby configured such that replay lags a prescribed
>> amount of time behind the master.
>>
>> This seemed easy to implement, so I did. Patch (for 9.2, obviously)
>> attached.
>>
>
> This crashes when stoping recovery to a target (i tried with a named
> restore point and with a poin in time) after executing
> pg_xlog_replay_resume(). here is the backtrace. I will try to check
> later but i wanted to report it before...
>
> #0 0xb7777537 in raise () from /lib/libc.so.6
> #1 0xb777a922 in abort () from /lib/libc.so.6
> #2 0x08393a19 in errfinish (dummy=0) at elog.c:513
> #3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 "wal receiver
> still active") at elog.c:1156
> #4 0x080f04cb in StartupXLOG () at xlog.c:6691
> #5 0x080f2825 in StartupProcessMain () at xlog.c:10050
> #6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at
> bootstrap.c:417
> #7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488
> #8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106
> #9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199
Sorry for the slow response on this - I was on vacation for a week and
my schedule got a big hole in it.
I was able to reproduce something very like this in unpatched master,
just by letting recovery pause at a named restore point, and then
resuming it.
LOG: recovery stopping at restore point "stop", time 2011-05-07
09:28:01.652958-04
LOG: recovery has paused
HINT: Execute pg_xlog_replay_resume() to continue.
(at this point I did pg_xlog_replay_resume())
LOG: redo done at 0/5000020
PANIC: wal receiver still active
LOG: startup process (PID 38762) was terminated by signal 6: Abort trap
LOG: terminating any other active server processes
I'm thinking that this code is wrong:
if (recoveryPauseAtTarget && standbyState ==
STANDBY_SNAPSHOT_READY)
{
SetRecoveryPause(true);
recoveryPausesHere();
}
reachedStopPoint = true; /* see below */
recoveryContinue = false;
I think that recoveryContinue = false assignment should not happen if
we decide to pause. That is, we should say if (recoveryPauseAtTarget
&& standbyState == STANDBY_SNAPSHOT_READY) { same as now } else
recoveryContinue = false.
I haven't tested that, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers