On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote: > When the postmaster recovers from a backend or worker crash, it resets bg > worker's crash time (rw->rw_crashed_at) so that the bgworker will > immediately restart (ResetBackgroundWorkerCrashTimes). > > But resetting rw->rw_crashed_at to 0 means that we have lost the information > that the bgworker had actuallly crashed. So later when postmaster tries to > find any workers that should start (maybe_start_bgworker), it treats this > worker as a new worker, as against treating it as one that had crashed and > is to be restarted. So for this bgworker, it does not consider > BGW_NEVER_RESTART : > > if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time == > BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... .... > That means, it will not remove the worker, and it will be restarted. Now if > the worker again crashes, postmaster would keep on repeating the crash and > restart cycle for the whole system. > > From what I understand, BGW_NEVER_RESTART applies even to a crashed server. > But let me know if I am missing anything. > > I think we either have to retain the knowledge that the worker has crashed > using some new field, or else, we should reset the crash time only if it is > not flagged BGW_NEVER_RESTART.
I think you're right, and I think we should do the second of those. Thanks for tracking this down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers