Re: [HACKERS] Unintended restart after recovery error

Robert Haas Wed, 12 Nov 2014 15:31:56 -0800

On Wed, Nov 12, 2014 at 4:52 PM, Antonin Houska <[email protected]> wrote:
> Fujii Masao <[email protected]> wrote:
>
>> On Wed, Nov 12, 2014 at 6:52 PM, Antonin Houska <[email protected]> wrote:
>> > While looking at postmaster.c:reaper(), one problematic case occurred to 
>> > me.
>> >
>> >
>> > 1. Startup process signals PMSIGNAL_RECOVERY_STARTED.
>> >
>> > 2. Checkpointer process is forked and immediately dies.
>> >
>> > 3. reaper() catches this failure, calls HandleChildCrash() and thus sets
>> > FatalError to true.
>> >
>> > 4. Startup process exits with non-zero status code too - either due to 
>> > SIGQUIT
>> > received from HandleChildCrash or due to some other failure of the startup
>> > process itself. However, FatalError is already set, because of the previous
>> > crash of the checkpointer. Thus reaper() does not set RecoveryError.
>> >
>> > 5. As RecoverError failed to be set to true, postmaster will try to restart
>> > the cluster, although it apparently should not.
>>
>> Why shouldn't postmaster restart the cluster in that case?
>>
>
> At least for the behavior to be consistent with simpler cases of failed
> recovery (e.g. any FATAL error in StartupXLOG), which end up not restarting
> the cluster.


It's true that if the startup process dies we don't try to restart,
but it's also true that if the checkpointer dies we do try to restart.
I'm not sure why this specific situation should be an exception to
that general rule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unintended restart after recovery error

Reply via email to