On Mon, May 3, 2010 at 2:47 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> Hmm.  When I committed that patch to fix smart shutdown on the
>> standby, we discussed the fact that the startup process can't simply
>> release its locks and die at shutdown time because the locks it holds
>> prevent other backends from seeing the database in an inconsistent
>> state.  Therefore, if we were to terminate recovery as soon as the
>> smart shutdown request is received, we might never complete, because a
>> backend might be waiting on a lock that will never get released.  If
>> that's really a danger scenario, then it follows that we might also
>> fail to shut down if we can't connect to the primary, because we might
>> not be able to replay enough WAL to release the locks the remaining
>> backends are waiting for.  That sort of looks like what is happening
>> to you, except based on your test scenario I can't figure out where
>> this came from:
>
>> FATAL:  replication terminated by primary server
>
> I suspect you have it right, because my experiments where the standby
> did shut down correctly were all done with an idle master.
>
> Seems like we could go ahead and forcibly kill the startup process *once
> all the standby backends are gone*.  There is then no need to worry
> about not releasing locks, and re-establishing a consistent state when
> we later restart is logic that we have to have anyway.

That's exactly what we already do.  The problem is that smart shutdown
doesn't actually kill off the standby backends - it waits for them to
exit on their own.  Except, if they're blocking on a lock that's never
going to get released, then they never do.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to