Fujii Masao escribió: > On Thu, Sep 17, 2009 at 5:08 PM, Heikki Linnakangas > <heikki.linnakan...@enterprisedb.com> wrote: > > Walreceiver is really a slave to the startup process. The startup > > process decides when it's launched, and it's the startup process that > > then waits for it to advance. But the way it's set up at the moment, the > > startup process needs to ask the postmaster to start it up, and it > > doesn't look very robust to me. For example, if launching walreceiver > > fails for some reason, startup process will just hang waiting for it. > > I changed the postmaster to report the failure of fork of the walreceiver > to the startup process by resetting WalRcv->in_progress, which prevents > the startup process from getting stuck when launching walreceiver fails. > http://archives.postgresql.org/pgsql-hackers/2009-09/msg01996.php > > Do you have another concern about the robustness? If yes, I'll address that.
Hmm. Without looking at the patch at all, this seems similar to how autovacuum does things: autovac launcher signals postmaster that a worker needs to be started. Postmaster proceeds to fork a worker. This could obviously fail for a lot of reasons. Now, there is code in place to notify the user when forking fails, and this is seen on the wild quite a bit more than one would like :-( I think it would be a good idea to have a retry mechanism in the walreceiver startup mechanism so that recovery does not get stuck due to transient problems. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers