On 5 May 2011 21:05, Tom Lane <t...@sss.pgh.pa.us> wrote: > The major problem I'm aware of for getting rid of periodic wakeups is > the need for child processes to notice when the postmaster has died > unexpectedly. Your patch appears to degrade the archiver's response > time for that really significantly, like from O(1 sec) to O(1 min), > which I don't think is acceptable. We've occasionally kicked around > ideas for mechanisms that would solve this problem, but nothing's gotten > done. It doesn't seem to be an easy problem to solve portably...
Could you please expand upon this? Why is it of any consequence if the archiver notices that the postmaster is dead after 60 seconds rather than after 1? So control in the archiver is going to stay in its event loop for longer than it would have before, until pgarch_MainLoop() finally returns. The DBA might be required to kill the archiver where before they wouldn't have been (they wouldn't have had time to), but they are also required to kill other backends anyway before deleting postmaster.pid, or there will be dire consequences. Nothing important happens after waiting on the latch but before checking PostmasterIsAlive(), and nothing important happens after the postmaster is found to be dead. ISTM that it wouldn't be particularly bad if the archiver was SIGKILL'd while waiting on a latch. The only salient thread I found concerning the problem of making children know when the postmaster died is this one: http://archives.postgresql.org/pgsql-hackers/2010-12/msg00401.php Fujii Masao suggests removing wal_sender_delay in that thread, and replacing it with a generic default. That does work well with my suggestion to unify these sorts of timeouts under a single GUC. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers