On 07/05/14 22:32, Robert Haas wrote:
On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek <p...@2ndquadrant.com> wrote:
I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.


No I think it's fine, I didn't try that combination and just wanted to put it as deep in the call as possible.


(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart.  A
non-shmem-connected backend never causes a crash-and-restart.

+1

I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.

The fixed one looks ok to me.


(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.

+1

This isn't done yet.


Unless I am missing something this change was included in every patch I sent - setting rw->rw_terminate = true; in CleanupBackgroundWorker for zero exit code + comment changes. Or do you have objections to this approach?

Anyway missing parts attached.

--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index fd32d6c..5f86100 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -166,10 +166,12 @@ typedef struct BackgroundWorker
   </para>
 
   <para>
-   Background workers are expected to be continuously running; if they exit
-   cleanly, <command>postgres</> will restart them immediately.  Consider doing
-   interruptible sleep when they have nothing to do; this can be achieved by
-   calling <function>WaitLatch()</function>.  Make sure the
+   Background workers are normally expected to be running continuously;
+   they get restarted automaticaly in case of crash (see 
+   <structfield>bgw_restart_time</structfield> documentation for details),
+   but if they exit cleanly, <command>postgres</> will not restart them.
+   Consider doing interruptible sleep when they have nothing to do; this can be
+   achieved by calling <function>WaitLatch()</function>. Make sure the
    <literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
    verify the return code for a prompt exit in the emergency case that
    <command>postgres</> itself has terminated.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 64c9722..b642f37 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -884,8 +884,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
  * running but is no longer.
  *
  * In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
  */
 BgwHandleStatus
 GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 79d1c50..a0331e6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2849,7 +2849,11 @@ CleanupBackgroundWorker(int pid,
 		if (!EXIT_STATUS_0(exitstatus))
 			rw->rw_crashed_at = GetCurrentTimestamp();
 		else
+		{
+			/* Zero exit status means terminate */
 			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
+		}
 
 		/*
 		 * Additionally, for shared-memory-connected workers, just like a
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index c9550cc..d3dd1f2 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -16,10 +16,11 @@
  * that the failure can only be transient (fork failure due to high load,
  * memory pressure, too many processes, etc); more permanent problems, like
  * failure to connect to a database, are detected later in the worker and dealt
- * with just by having the worker exit normally.  A worker which exits with a
- * return code of 0 will be immediately restarted by the postmaster.  A worker
- * which exits with a return code of 1 will be restarted after the configured
- * restart interval, or never if that interval is set to BGW_NEVER_RESTART.
+ * with just by having the worker exit normally. A worker which exits with
+ * a return code of 0 will never be restarted and will be removed from worker
+ * list. A worker which exits with a return code of 1 will be restarted after
+ * the configured restart interval, or never if that interval is set to
+ * BGW_NEVER_RESTART.
  * The TerminateBackgroundWorker() function can be used to terminate a
  * dynamically registered background worker; the worker will be sent a SIGTERM
  * and will not be restarted after it exits.  Whenever the postmaster knows
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to