On Jul8, 2011, at 14:40 , Heikki Linnakangas wrote: > On 08.07.2011 13:58, Florian Pflug wrote: >> On Jul8, 2011, at 11:57 , Peter Geoghegan wrote: >>> On 7 July 2011 19:15, Robert Haas<robertmh...@gmail.com> wrote: >>>>> I'm not concerned about the possibility of spurious extra cycles of >>>>> auxiliary process event loops - should I be? >>>> >>>> A tight loop would be bad, but an occasional spurious wake-up seems >>>> harmless. >>> >>> We should also assert !PostmasterIsAlive() from within the latch code >>> after waking due to apparent Postmaster death. The reason that I don't >>> want to follow Florian's suggestion to check it in production is that >>> I don't know what to do if the postmaster turns out to be alive. Why >>> is it more reasonable to try again than to just return? >> >> I'd say return, but don't indicate postmaster death in the return value >> if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in >> WaitLatch(), and return indicating postmaster death whenever select() >> says so, and put the burden of re-checking on the callers. > > I put the burden on the callers. Removing the return value from WaitLatch() > altogether just makes life unnecessarily difficult for callers that could > safely use that information, even if you sometimes get spurious wakeups. In > particular, the coding in pgarch.c is nicer if you can simply check the > return code for WL_TIMEOUT, rather than call time(NULL) to figure out if the > timeout was reached. > > Attached is a new version of this patch. PostmasterIsAlive() now uses read() > on the pipe instead of kill().
I did notice a few (very minor) loose ends... SyncRepWaitForLSN() says /* * Wait on latch for up to 60 seconds. This allows us to check for * postmaster death regularly while waiting. Note that timeout here * does not necessarily release from loop. */ WaitLatch(&MyProc->waitLatch, 60000000L); I guess that 60-second timeout is unnecessary now that we'll wake up on postmaster death anyway. Also, none of the callers of WaitLatch() seems to actually inspect the WL_POSTMASTER_DEATH bit of the result. We might want to make their !PostmasterIsAlive() check conditional on the WL_POSTMASTER_DEATH bit being set. At least in the case of SyncRepWaitForLSN(), it seems that avoiding the extra read() syscall might be beneficial. Maybe these cleanups would better be done in a separate patch, though... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers