Attached patch addresses Fujii's more recent concerns.

On 22 June 2011 04:54, Fujii Masao <masao.fu...@gmail.com> wrote:

> +WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
> +WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket
> sock, long timeout)
>
> If 'wakeEvent' is zero, we cannot get out of WaitLatch(). Something like
> Assert(waitEvents != 0) is required? Or, WaitLatch() should always wait
> on latch even when 'waitEvents' is zero?

Well, not waking when the client has not specified an event to wake on
is the correct thing to do in that case. It would also be inherently
undesirable, so I'd be happy to guard against it using an assertion.
Both implementations now use one.

> In unix_latch.c, select() in WaitLatchOrSocket() checks the timeout only when
> WL_TIMEOUT is set in 'wakeEvents'. OTOH, in win32_latch.c,
> WaitForMultipleObjects()
> in WaitLatchOrSocket() always checks the timeout even if WL_TIMEOUT is not
> given. Is this intentional?

No, it's a mistake. Fixed.

> +               else if (rc == WAIT_OBJECT_0 + 2 &&
> +                                ((wakeEvents & WL_SOCKET_READABLE) || 
> (wakeEvents & WL_SOCKET_WRITEABLE)))
>
> Another corner case: when WL_POSTMASTER_DEATH and WL_SOCKET_READABLE
> are given and 'sock' is set to PGINVALID_SOCKET, we can wrongly pass through 
> the
> above check. If this OK?

I see your point - Assert(sock != PGINVALID_SOCKET) could be violated.
We fix the issue now by setting and checking a bool that simply
indicates that we're interested in sockets.

>                rc = WaitForMultipleObjects(numevents, events, FALSE,
>                                                           (timeout >= 0) ? 
> (timeout / 1000) : INFINITE);
> -               if (rc == WAIT_FAILED)
> +               if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
> +                        !PostmasterIsAlive(true))
>
> After WaitForMultipleObjects() detects the death of postmaster,
> WaitForSingleObject()
> is called in PostmasterIsAlive(). In this case, what code does
> WaitForSingleObject() return?
> I wonder if WaitForSingleObject() returns the code other than
> WAIT_TIMEOUT and really
> can detect the death of postmaster.

As noted up-thread, the fact that the archiver does wake and finish on
Postmaster death can be clearly observed on Windows. I'm not sure why
you wonder that, as this is fairly standard use of
PostmasterIsAlive().  I've verified that the waitLatch() call
correctly reports Postmaster death in its return code on Windows, and
indeed that it actually wakes up.

Are you suggesting that there should be a defensive else if{ } for the
case where PostmasterIsAlive() incorrectly reports that the PM is
alive due to some implementation related race-condition, and we've
already considered every other possibility? Well, I suppose that's not
necessary, because we will loop until we find a reason - it's okay to
miss it the first time around, because whatever caused
WaitForMultipleObjects() to wake up will cause it to immediately
return for the next iteration.

In any case, we don't rely on the PostmasterIsAlive() call at all
anymore, so it doesn't matter. We just look at rc's value now, as we
do for every other case, though it's a bit trickier when checking
Postmaster death. Similarly, we don't have a PostmasterIsAlive() call
within the unix latch implementation anymore.

> +       if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_GETFL) < 0)
> +       {
> +               ereport(FATAL,
> +                       (errcode_for_socket_access(),
> +                        errmsg("failed to set the postmaster death watching 
> fd's flags:
> %s", strerror(errno))));
> +       }
>
> Is the above check really required? It's harmless, but looks unnecessary.

Yes, it's not possible for it to detect an error condition now. Removed.

> '%m' should be used instead of '%s' and 'strerror(errno)'.

It is of course better to use the simpler, built-in facility. Fixed.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4952d22..bfe6bcd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10158,7 +10158,7 @@ retry:
 					/*
 					 * Wait for more WAL to arrive, or timeout to be reached
 					 */
-					WaitLatch(&XLogCtl->recoveryWakeupLatch, 5000000L);
+					WaitLatch(&XLogCtl->recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 5000000L);
 					ResetLatch(&XLogCtl->recoveryWakeupLatch);
 				}
 				else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..6d2e3a1 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
 
@@ -188,22 +189,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +215,15 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
+
+	Assert(wakeEvents != 0);
 
 	if (latch->owner_pid != MyProcPid)
 		elog(ERROR, "cannot wait on a latch owned by another process");
 
 	/* Initialize timeout */
-	if (timeout >= 0)
+	if (timeout >= 0 && (wakeEvents & WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 1000000L;
 		tv.tv_usec = timeout % 1000000L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(&input_mask);
 		FD_SET(selfpipe_readfd, &input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET && forRead)
+
+		if (wakeEvents & WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH] > hifd)
+				hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, &input_mask);
 			if (sock > hifd)
@@ -252,7 +274,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(&output_mask);
-		if (sock != PGINVALID_SOCKET && forWrite)
+		if (sock != PGINVALID_SOCKET && (wakeEvents & WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, &output_mask);
 			if (sock > hifd)
@@ -268,20 +290,33 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 					(errcode_for_socket_access(),
 					 errmsg("select() failed: %m")));
 		}
-		if (rc == 0)
+		if (rc == 0 && (wakeEvents & WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET &&
-			((forRead && FD_ISSET(sock, &input_mask)) ||
-			 (forWrite && FD_ISSET(sock, &output_mask))))
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;				/* data available in socket */
+			if ((wakeEvents & WL_SOCKET_READABLE ) && FD_ISSET(sock, &input_mask))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true; /* data available in socket */
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
+		}
+		if ((wakeEvents & WL_POSTMASTER_DEATH) &&
+			 FD_ISSET(postmaster_alive_fds[POSTMASTER_FD_WATCH], &input_mask))
+		{
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
 		}
 	}
+	while(!found);
 	waiting = false;
 
 	return result;
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 4bcf7b7..21e406f 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -23,6 +23,7 @@
 #include <unistd.h>
 
 #include "miscadmin.h"
+#include "postmaster/postmaster.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/shmem.h"
@@ -81,43 +82,51 @@ DisownLatch(volatile Latch *latch)
 	latch->owner_pid = 0;
 }
 
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout) > 0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 int
-WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
-				  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, SOCKET sock, long timeout)
 {
 	DWORD		rc;
-	HANDLE		events[3];
+	HANDLE		events[4];
 	HANDLE		latchevent;
 	HANDLE		sockevent = WSA_INVALID_EVENT;	/* silence compiler */
 	int			numevents;
 	int			result = 0;
+	bool		found = false;
+	bool		checking_socket = false;
+
+	Assert(wakeEvents != 0);
 
 	latchevent = latch->event;
 
 	events[0] = latchevent;
 	events[1] = pgwin32_signal_event;
 	numevents = 2;
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		int			flags = 0;
+		checking_socket = true;
 
-		if (forRead)
+		if (wakeEvents & WL_SOCKET_READABLE)
 			flags |= FD_READ;
-		if (forWrite)
+		if (wakeEvents & WL_SOCKET_WRITEABLE)
 			flags |= FD_WRITE;
 
 		sockevent = WSACreateEvent();
 		WSAEventSelect(sock, sockevent, flags);
 		events[numevents++] = sockevent;
 	}
+	if (wakeEvents & WL_POSTMASTER_DEATH)
+	{
+		events[numevents++] = PostmasterHandle;
+	}
 
-	for (;;)
+	do
 	{
 		/*
 		 * Reset the event, and check if the latch is set already. If someone
@@ -127,24 +136,41 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 		 */
 		if (!ResetEvent(latchevent))
 			elog(ERROR, "ResetEvent failed: error code %d", (int) GetLastError());
-		if (latch->is_set)
+		if (latch->is_set && (wakeEvents & WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		rc = WaitForMultipleObjects(numevents, events, FALSE,
-							   (timeout >= 0) ? (timeout / 1000) : INFINITE);
-		if (rc == WAIT_FAILED)
+								(timeout >= 0 && (wakeEvents & WL_TIMEOUT)) ? (timeout / 1000) : INFINITE);
+		/* Whether or not we're interested in sockets affects which
+		 * rc value indicates that PostmasterHandle indicated PM death
+		 */
+		if ( (wakeEvents & WL_POSTMASTER_DEATH) &&
+			 rc ==  WAIT_OBJECT_0 + (checking_socket? 3:2))
+		{
+			/* Postmaster died */
+			result |= WL_POSTMASTER_DEATH;
+			found = true;
+		}
+		else if (rc == WAIT_FAILED)
 			elog(ERROR, "WaitForMultipleObjects() failed: error code %d", (int) GetLastError());
 		else if (rc == WAIT_TIMEOUT)
 		{
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
 		else if (rc == WAIT_OBJECT_0 + 1)
 			pgwin32_dispatch_queued_signals();
-		else if (rc == WAIT_OBJECT_0 + 2)
+		else if (rc == WAIT_OBJECT_0 + 2 && checking_socket) /* If we are checking_socket, it will be WAIT_OBJECT_0 + 2 */
 		{
 			WSANETWORKEVENTS resEvents;
 
@@ -155,17 +181,24 @@ WaitLatchOrSocket(volatile Latch *latch, SOCKET sock, bool forRead,
 				ereport(FATAL,
 						(errmsg_internal("failed to enumerate network events: %i", (int) GetLastError())));
 
-			if ((forRead && resEvents.lNetworkEvents & FD_READ) ||
-				(forWrite && resEvents.lNetworkEvents & FD_WRITE))
-				result = 2;
-			break;
+			if ((wakeEvents & WL_SOCKET_READABLE) && (resEvents.lNetworkEvents & FD_READ))
+			{
+				result |= WL_SOCKET_READABLE;
+				found = true;
+			}
+			if ((wakeEvents & WL_SOCKET_WRITEABLE) && (resEvents.lNetworkEvents & FD_WRITE))
+			{
+				result |= WL_SOCKET_WRITEABLE;
+				found = true;
+			}
 		}
 		else if (rc != WAIT_OBJECT_0)
 			elog(ERROR, "unexpected return code from WaitForMultipleObjects(): %d", (int) rc);
 	}
+	while(!found);
 
 	/* Clean up the handle we created for the socket */
-	if (sock != PGINVALID_SOCKET && (forRead || forWrite))
+	if (sock != PGINVALID_SOCKET && ((wakeEvents & WL_SOCKET_READABLE) || (wakeEvents & WL_SOCKET_WRITEABLE)))
 	{
 		WSAEventSelect(sock, sockevent, 0);
 		WSACloseEvent(sockevent);
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index b2fe9a1..db9401a 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -11,6 +11,8 @@
  */
 #include "postgres.h"
 #include "postmaster/fork_process.h"
+#include "postmaster/postmaster.h"
+
 
 #include <fcntl.h>
 #include <time.h>
@@ -19,13 +21,14 @@
 #include <unistd.h>
 
 #ifndef WIN32
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
  * child in the parent process.
  */
 pid_t
-fork_process(void)
+do_fork_process(bool remain_postmaster)
 {
 	pid_t		result;
 
@@ -61,6 +64,17 @@ fork_process(void)
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
+		/*
+		 * Usually, we're forking to create a new, distinct process. That process
+		 * should release the postmaster death watch handle, which is required by
+		 * the implementation, as described in unix_latch.c.
+		 *
+		 * Less frequently, we want to fork for some other reason (such as for
+		 * silent_mode), and the child process is intended to become the new
+		 * postmaster. It should therefore retain the death watch handle.
+		 */
+		if (!remain_postmaster)
+			ReleasePostmasterDeathWatchHandle();
 
 		/*
 		 * By default, Linux tends to kill the postmaster in out-of-memory
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index b40375a..a56fe92 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -40,6 +40,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
+#include "storage/latch.h"
 #include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "utils/guc.h"
@@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false;
 static volatile sig_atomic_t wakened = false;
 static volatile sig_atomic_t ready_to_stop = false;
 
+/*
+ * Latch that archiver loop waits on until it is awakened by
+ * signals, each of which there is a handler for
+ */
+static volatile Latch mainloop_latch;
+
 /* ----------
  * Local function forward declarations
  * ----------
@@ -228,6 +235,8 @@ PgArchiverMain(int argc, char *argv[])
 
 	MyProcPid = getpid();		/* reset MyProcPid */
 
+	InitLatch(&mainloop_latch); /* initialise latch used in main loop, now that we are a subprocess */
+
 	MyStartTime = time(NULL);	/* record Start Time for logging */
 
 	/*
@@ -282,6 +291,8 @@ ArchSigHupHandler(SIGNAL_ARGS)
 {
 	/* set flag to re-read config file at next convenient time */
 	got_SIGHUP = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGTERM signal handler for archiver process */
@@ -295,6 +306,8 @@ ArchSigTermHandler(SIGNAL_ARGS)
 	 * archive commands.
 	 */
 	got_SIGTERM = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR1 signal handler for archiver process */
@@ -303,6 +316,8 @@ pgarch_waken(SIGNAL_ARGS)
 {
 	/* set flag that there is work to be done */
 	wakened = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /* SIGUSR2 signal handler for archiver process */
@@ -311,6 +326,8 @@ pgarch_waken_stop(SIGNAL_ARGS)
 {
 	/* set flag to do a final cycle and shut down afterwards */
 	ready_to_stop = true;
+	/* Let the waiting loop iterate */
+	SetLatch(&mainloop_latch);
 }
 
 /*
@@ -334,6 +351,13 @@ pgarch_MainLoop(void)
 
 	do
 	{
+		/*
+		 * There shouldn't be anything for the archiver to do except to wait
+		 * on a latch ... however, the archiver exists to protect our data,
+		 * so she wakes up occasionally to allow herself to be proactive.
+		 */
+		ResetLatch(&mainloop_latch);
+
 		/* When we get SIGUSR2, we do one more archive cycle, then exit */
 		time_to_stop = ready_to_stop;
 
@@ -371,25 +395,27 @@ pgarch_MainLoop(void)
 		}
 
 		/*
-		 * There shouldn't be anything for the archiver to do except to wait
-		 * for a signal ... however, the archiver exists to protect our data,
-		 * so she wakes up occasionally to allow herself to be proactive.
+		 * Wait on latch, until various signals are received, or
+		 * until a poll will be forced by PGARCH_AUTOWAKE_INTERVAL
+		 * having passed since last_copy_time, or on the postmaster's
+		 * untimely demise.
 		 *
-		 * On some platforms, signals won't interrupt the sleep.  To ensure we
-		 * respond reasonably promptly when someone signals us, break down the
-		 * sleep into 1-second increments, and check for interrupts after each
-		 * nap.
+		 * The caveat about signals resetting the timeout of
+		 * WaitLatch()/select() on some platforms can be safely disregarded,
+		 * because we handle all expected signals, and all handlers
+		 * call SetLatch() where that matters anyway
 		 */
-		while (!(wakened || ready_to_stop || got_SIGHUP ||
-				 !PostmasterIsAlive(true)))
-		{
-			time_t		curtime;
 
-			pg_usleep(1000000L);
+		if (!time_to_stop) /* Don't wait during last iteration */
+		{
+			time_t		 curtime = time(NULL);
+			unsigned int timeout_secs  = (unsigned int) PGARCH_AUTOWAKE_INTERVAL -
+					(unsigned int) (curtime - last_copy_time);
+			WaitLatch(&mainloop_latch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout_secs * 1000000L);
 			curtime = time(NULL);
 			if ((unsigned int) (curtime - last_copy_time) >=
 				(unsigned int) PGARCH_AUTOWAKE_INTERVAL)
-				wakened = true;
+				wakened = true; /* wakened by timeout - this wasn't a SIGHUP, etc */
 		}
 
 		/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 6572292..5a3e059 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -443,6 +443,7 @@ typedef struct
 	HANDLE		syslogPipe[2];
 #else
 	int			syslogPipe[2];
+	int			postmaster_alive_fds[2];
 #endif
 	char		my_exec_path[MAXPGPATH];
 	char		pkglib_path[MAXPGPATH];
@@ -472,6 +473,13 @@ static void ShmemBackendArrayRemove(Backend *bn);
 #define EXIT_STATUS_0(st)  ((st) == 0)
 #define EXIT_STATUS_1(st)  (WIFEXITED(st) && WEXITSTATUS(st) == 1)
 
+/*
+ * 2 file descriptors that monitoring if postmaster is alive.
+ * First is POSTMASTER_FD_WATCH, second is POSTMASTER_FD_OWN.
+ */
+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif
 
 /*
  * Postmaster main entry point
@@ -491,6 +499,15 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+#ifndef WIN32
+	/*
+	 * Initialise mechanism that allows waiting latch clients
+	 * to wake on postmaster death, to finish their
+	 * remaining business
+	 */
+	InitPostmasterDeathWatchHandle();
+#endif
+
 	/*
 	 * for security, no dir or file created can be group or other accessible
 	 */
@@ -1312,7 +1329,7 @@ pmdaemonize(void)
 	/*
 	 * Okay to fork.
 	 */
-	pid = fork_process();
+	pid = fork_process_remain_postmaster();
 	if (pid == (pid_t) -1)
 	{
 		write_stderr("%s: could not fork background process: %s\n",
@@ -4758,6 +4775,9 @@ save_backend_variables(BackendParameters *param, Port *port,
 
 	memcpy(&param->syslogPipe, &syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&param->postmaster_alive_fds, &postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
 	strlcpy(param->my_exec_path, my_exec_path, MAXPGPATH);
 
 	strlcpy(param->pkglib_path, pkglib_path, MAXPGPATH);
@@ -4973,6 +4993,10 @@ restore_backend_variables(BackendParameters *param, Port *port)
 
 	memcpy(&syslogPipe, &param->syslogPipe, sizeof(syslogPipe));
 
+#ifndef WIN32
+	memcpy(&postmaster_alive_fds, &param->postmaster_alive_fds, sizeof(postmaster_alive_fds));
+#endif
+
 	strlcpy(my_exec_path, param->my_exec_path, MAXPGPATH);
 
 	strlcpy(pkglib_path, param->pkglib_path, MAXPGPATH);
@@ -5088,5 +5112,79 @@ pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired)
 	/* Queue SIGCHLD signal */
 	pg_queue_signal(SIGCHLD);
 }
+#else
+/*
+ * Initialise one and only handle for monitoring postmaster death.
+ *
+ * Called once from the postmaster, so that child processes can subsequently
+ * monitor if their parent is dead. We open up an anoymous pipe, and have child
+ * processes block on a select() call that examines if the read file descriptor
+ * is ready for reading. They do so through a latch.
+ *
+ * Child processes are responsible for releasing the death watch handler, so
+ * that only the postmaster holds it, and a select() on the fd returns upon the
+ * one and only holder (the postmaster) dying.
+ *
+ * This is a trick that obviates the need for auxiliary backends to have tight
+ * polling loops where they check if the postmaster is alive. We do this because
+ * that pattern results in an excessive number of wakeups per second when idle.
+ */
+void
+InitPostmasterDeathWatchHandle(void)
+{
+	/*
+	 * Create pipe. The postmaster is deemed dead if
+	 * no process has the writing end (POSTMASTER_FD_OWN) open.
+	 */
+	Assert(MyProcPid == PostmasterPid);
+	if (pipe(postmaster_alive_fds))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg( "pipe() call failed to create pipe to monitor postmaster death: %m")));
+	}
+	/*
+	 * Set O_NONBLOCK to allow checking for the fd's presence with a select() call
+	 */
+	if (fcntl(postmaster_alive_fds[POSTMASTER_FD_WATCH], F_SETFL, O_NONBLOCK))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to set the postmaster death watching fd's flags: %m")));
+	}
+}
 
-#endif   /* WIN32 */
+/*
+ * Release postmaster death watch handle.
+ *
+ * Important: This must be called immediately after a process
+ * forks from the postmaster. Otherwise, latch clients will
+ * not wake up on postmaster death, even if they have requested
+ * to.
+ *
+ * Even some hypothetical backend that doesn't care about postmaster
+ * death has a responsibility to call this function - otherwise,
+ * some other latch client backend could wait in vain to be informed
+ * of postmaster death, because the irresponsible backend held open
+ * the ownership file descriptor and outlived the postmaster.
+ *
+ * We call the function within the fork machinery to handle all cases,
+ * so backends need not bother with this themselves.
+ */
+void
+ReleasePostmasterDeathWatchHandle(void)
+{
+	/* MyProcPid won't have been set yet */
+	Assert(PostmasterPid != getpid());
+	/* Please don't ask twice */
+	Assert(postmaster_alive_fds[POSTMASTER_FD_OWN] != -1);
+	/* Release parent's ownership fd - only postmaster should hold it */
+	if (close(postmaster_alive_fds[POSTMASTER_FD_OWN]))
+	{
+		ereport(FATAL,
+			(errcode_for_socket_access(),
+			 errmsg("failed to close file descriptor associated with postmaster death in child process")));
+	}
+	postmaster_alive_fds[POSTMASTER_FD_OWN] = -1;
+}
+#endif
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 2b52d16..7cf6206 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -171,7 +171,7 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 		 * postmaster death regularly while waiting. Note that timeout here
 		 * does not necessarily release from loop.
 		 */
-		WaitLatch(&MyProc->waitLatch, 60000000L);
+		WaitLatch(&MyProc->waitLatch, WL_LATCH_SET | WL_TIMEOUT, 60000000L);
 
 		/* Must reset the latch before testing state. */
 		ResetLatch(&MyProc->waitLatch);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..27cc350 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -805,8 +805,9 @@ WalSndLoop(void)
 			}
 
 			/* Sleep */
-			WaitLatchOrSocket(&MyWalSnd->latch, MyProcPort->sock,
-							  true, pq_is_send_pending(),
+			WaitLatchOrSocket(&MyWalSnd->latch,
+							  WL_LATCH_SET | WL_SOCKET_READABLE | (pq_is_send_pending()? WL_SOCKET_WRITEABLE:0) |  WL_TIMEOUT,
+							  MyProcPort->sock,
 							  sleeptime * 1000L);
 
 			/* Check for replication timeout */
diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
index 0553fd2..e0abe5d 100644
--- a/src/include/postmaster/fork_process.h
+++ b/src/include/postmaster/fork_process.h
@@ -12,6 +12,8 @@
 #ifndef FORK_PROCESS_H
 #define FORK_PROCESS_H
 
-extern pid_t fork_process(void);
+extern pid_t do_fork_process(bool remain_postmaster);
+#define fork_process() do_fork_process(false)
+#define fork_process_remain_postmaster() do_fork_process(true)
 
 #endif   /* FORK_PROCESS_H */
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 25cc84a..497cf51 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -33,6 +33,25 @@ extern bool restart_after_crash;
 
 #ifdef WIN32
 extern HANDLE PostmasterHandle;
+#else
+/*
+ * Constants that represent which of a pair of fds given
+ * to pipe() is watched and owned in the context of
+ * dealing with postmaster death
+ */
+#define POSTMASTER_FD_WATCH 0
+#define POSTMASTER_FD_OWN 1
+extern int postmaster_alive_fds[2];
+/*
+ * On unix, it is necessary to Init monitoring
+ * of postmaster being alive
+ */
+extern void InitPostmasterDeathWatchHandle(void);
+/*
+ * It is also necessary to call ReleasePostmasterDeathWatchHandle()
+ * after forking from PM for the Unix implementation
+ */
+extern void ReleasePostmasterDeathWatchHandle(void);
 #endif
 
 extern const char *progname;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 03ec071..6865ac7 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -38,9 +38,8 @@ extern void InitLatch(volatile Latch *latch);
 extern void InitSharedLatch(volatile Latch *latch);
 extern void OwnLatch(volatile Latch *latch);
 extern void DisownLatch(volatile Latch *latch);
-extern bool WaitLatch(volatile Latch *latch, long timeout);
-extern int WaitLatchOrSocket(volatile Latch *latch, pgsocket sock,
-				  bool forRead, bool forWrite, long timeout);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout);
 extern void SetLatch(volatile Latch *latch);
 extern void ResetLatch(volatile Latch *latch);
 
@@ -56,4 +55,11 @@ extern void latch_sigusr1_handler(void);
 #define latch_sigusr1_handler()
 #endif
 
+/* Bitmasks for events that may wake-up WaitLatch() clients */
+#define WL_LATCH_SET         (1 << 0)
+#define WL_SOCKET_READABLE   (1 << 1)
+#define WL_SOCKET_WRITEABLE  (1 << 2)
+#define WL_TIMEOUT           (1 << 3)
+#define WL_POSTMASTER_DEATH  (1 << 4)
+
 #endif   /* LATCH_H */
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to