On Mon, May 3, 2010 at 2:47 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Hmm. When I committed that patch to fix smart shutdown on the >> standby, we discussed the fact that the startup process can't simply >> release its locks and die at shutdown time because the locks it holds >> prevent other backends from seeing the database in an inconsistent >> state. Therefore, if we were to terminate recovery as soon as the >> smart shutdown request is received, we might never complete, because a >> backend might be waiting on a lock that will never get released. If >> that's really a danger scenario, then it follows that we might also >> fail to shut down if we can't connect to the primary, because we might >> not be able to replay enough WAL to release the locks the remaining >> backends are waiting for. That sort of looks like what is happening >> to you, except based on your test scenario I can't figure out where >> this came from: > >> FATAL: replication terminated by primary server > > I suspect you have it right, because my experiments where the standby > did shut down correctly were all done with an idle master. > > Seems like we could go ahead and forcibly kill the startup process *once > all the standby backends are gone*. There is then no need to worry > about not releasing locks, and re-establishing a consistent state when > we later restart is logic that we have to have anyway.
That's exactly what we already do. The problem is that smart shutdown doesn't actually kill off the standby backends - it waits for them to exit on their own. Except, if they're blocking on a lock that's never going to get released, then they never do. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers