Re: [HACKERS] bgworker crashed or not?

Petr Jelinek Sun, 04 May 2014 06:58:31 -0700

On 30/04/14 23:35, Robert Haas wrote:

On Mon, Apr 28, 2014 at 4:19 PM, Petr Jelinek <p...@2ndquadrant.com> wrote:

On 28/04/14 16:27, Robert Haas wrote:


It seems we have consensus on what to do about this, but what we
haven't got is a patch.


If you mean the consensus that exit status 0 should mean permanent stop then
I think the patch can be as simple as attached.


Hmm.  Well, at the very least, you need to update the comment.

Obviously, also docs, the above was just demonstration that the patch issimple (and also all I could manage to write and test on the way fromairport to hotel).


The attached patch is more proper.

One thing I hit during going through the code is the quickdie handler.Originally it exited with exit code 0 so that bgworker was alwaysrestarted immediately, but since we changed meaning of exit code 0 tokeep bgworker dead forever I changed the exit code in quickdie to 2which seems to be consistent with how other backends work. One sideeffect is that when we get to normal backed crash recovery whereeverything gets restarted, bgworker restart will be affected bybgw_restart_time.

This should hopefully not be too big issue for normal bgworkers as it'san exceptional situation. And for dynamic bgworkers this is probablymore proper behavior anyway since currently they would get restartedimmediately without their parent process knowing about them anymore (asit also got restarted). The above BTW leads me to thinking that we mightwant to include a note somewhere in the docs saying that for mostuse-cases dynamic bgworkers should use bgw_restart_time =BGW_NEVER_RESTART (this is IMHO true for basically all use-cases exceptwhere you use them as static ones and just need to run dynamic number ofthem), but I didn't include such note in the patch.



--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index fd32d6c..83dfe54 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -101,7 +101,8 @@ typedef struct BackgroundWorker
    <command>postgres</command> should wait before restarting the process, in
    case it crashes.  It can be any positive value,
    or <literal>BGW_NEVER_RESTART</literal>, indicating not to restart the
-   process in case of a crash.
+   process in case of a crash. Note that this also affects restarts during
+   backend crash recovery.
   </para>
 
   <para>
@@ -166,10 +167,12 @@ typedef struct BackgroundWorker
   </para>
 
   <para>
-   Background workers are expected to be continuously running; if they exit
-   cleanly, <command>postgres</> will restart them immediately.  Consider doing
-   interruptible sleep when they have nothing to do; this can be achieved by
-   calling <function>WaitLatch()</function>.  Make sure the
+   Background workers are normally expected to be running continuously;
+   they get restarted automaticaly in case of crash (see 
+   <structfield>bgw_restart_time</structfield> documentation for details),
+   but if they exit cleanly, <command>postgres</> will not restart them.
+   Consider doing interruptible sleep when they have nothing to do; this can be
+   achieved by calling <function>WaitLatch()</function>. Make sure the
    <literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
    verify the return code for a prompt exit in the emergency case that
    <command>postgres</> itself has terminated.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f65a803..010326e 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -472,13 +472,13 @@ bgworker_quickdie(SIGNAL_ARGS)
 	on_exit_reset();
 
 	/*
-	 * Note we do exit(0) here, not exit(2) like quickdie.  The reason is that
-	 * we don't want to be seen this worker as independently crashed, because
-	 * then postmaster would delay restarting it again afterwards.  If some
-	 * idiot DBA manually sends SIGQUIT to a random bgworker, the "dead man
-	 * switch" will ensure that postmaster sees this as a crash.
+	 * Note we do exit(2) here, not exit(0) for same reasons quickdie does.
+	 * Another reason is that bgworker is not restarted after exit(0).
+	 *
+	 * Because of how bgworker restarts are handled, the restart might be
+	 * delayed or not happen at all based on the configuration.
 	 */
-	exit(0);
+	exit(2);
 }
 
 /*
@@ -832,8 +832,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
  * running but is no longer.
  *
  * In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
  */
 BgwHandleStatus
 GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b573fd8..be211f0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2848,7 +2848,10 @@ CleanupBackgroundWorker(int pid,
 		if (!EXIT_STATUS_0(exitstatus))
 			rw->rw_crashed_at = GetCurrentTimestamp();
 		else
+		{
 			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
+		}
 
 		/*
 		 * Additionally, for shared-memory-connected workers, just like a
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index 78d6c0e..a6fcbfd 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -16,10 +16,11 @@
  * that the failure can only be transient (fork failure due to high load,
  * memory pressure, too many processes, etc); more permanent problems, like
  * failure to connect to a database, are detected later in the worker and dealt
- * with just by having the worker exit normally.  A worker which exits with a
- * return code of 0 will be immediately restarted by the postmaster.  A worker
- * which exits with a return code of 1 will be restarted after the configured
- * restart interval, or never if that interval is set to BGW_NEVER_RESTART.
+ * with just by having the worker exit normally. A worker which exits with
+ * a return code of 0 will never be restarted and will be removed from worker
+ * list. A worker which exits with a return code of 1 will be restarted after
+ * the configured restart interval, or never if that interval is set to
+ * BGW_NEVER_RESTART.
  * The TerminateBackgroundWorker() function can be used to terminate a
  * dynamically registered background worker; the worker will be sent a SIGTERM
  * and will not be restarted after it exits.  Whenever the postmaster knows

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

Reply via email to