Tom Lane wrote: > Alvaro Herrera <alvhe...@commandprompt.com> writes: > > If sigusr1_handler needs rewriting, don't all the other sighandler as > > well? > > It does not, and neither do they. I'm not sure what happened here but > it wasn't the fault of the postmaster's organization of signal handlers. > > It does seem that we ought to change things so that there's a bit more > delay before trying to re-launch a failed autovac worker, though. > Whatever caused this was effectively turning the autovac logic into > a fork-bomb engine. I'm not thinking of just postponing the relaunch > into the main loop, but ensuring at least a few hundred msec delay before > we try again.
Would it be enough to move the kill() syscall into ServerLoop in postmaster.c instead of letting it be called in the signal handler, per the attached patch? This way the signal is not delayed, but we exit the signal handler before doing it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.587 diff -c -p -r1.587 postmaster.c *** src/backend/postmaster/postmaster.c 7 Aug 2009 05:58:55 -0000 1.587 --- src/backend/postmaster/postmaster.c 21 Aug 2009 21:21:05 -0000 *************** bool redirection_done = false; /* stder *** 290,295 **** --- 290,297 ---- /* received START_AUTOVAC_LAUNCHER signal */ static volatile sig_atomic_t start_autovac_launcher = false; + /* the launcher needs to be signalled to communicate some condition */ + static volatile bool avlauncher_needs_signal = false; /* * State for assigning random salts and cancel keys. *************** ServerLoop(void) *** 1391,1396 **** --- 1393,1406 ---- if (PgStatPID == 0 && pmState == PM_RUN) PgStatPID = pgstat_start(); + /* If we need to signal the autovacuum launcher, do so now */ + if (avlauncher_needs_signal) + { + avlauncher_needs_signal = false; + if (AutoVacPID != 0) + kill(AutoVacPID, SIGUSR1); + } + /* * Touch the socket and lock file every 58 minutes, to ensure that * they are not removed by overzealous /tmp-cleaning tasks. We assume *************** StartAutovacuumWorker(void) *** 4354,4365 **** /* * Report the failure to the launcher, if it's running. (If it's not, we * might not even be connected to shared memory, so don't try to call ! * AutoVacWorkerFailed.) */ if (AutoVacPID != 0) { AutoVacWorkerFailed(); ! kill(AutoVacPID, SIGUSR1); } } --- 4364,4379 ---- /* * Report the failure to the launcher, if it's running. (If it's not, we * might not even be connected to shared memory, so don't try to call ! * AutoVacWorkerFailed.) Note that we also need to signal it so that it ! * responds to the condition, but we don't do that here, instead waiting ! * for ServerLoop to do it. This way we avoid a ping-pong signalling in ! * quick succession between the autovac launcher and postmaster in case ! * things get ugly. */ if (AutoVacPID != 0) { AutoVacWorkerFailed(); ! avlauncher_needs_signal = true; } }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers