On Tue, Mar 17, 2015 at 1:33 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote:
> When the postmaster recovers from a backend or worker crash, it resets bg
> worker's crash time (rw->rw_crashed_at) so that the bgworker will
> immediately restart (ResetBackgroundWorkerCrashTimes).
>
> But resetting rw->rw_crashed_at to 0 means that we have lost the information
> that the bgworker had actuallly crashed. So later when postmaster tries to
> find any workers that should start (maybe_start_bgworker), it treats this
> worker as a new worker, as against treating it as one that had crashed and
> is to be restarted. So for this bgworker, it does not consider
> BGW_NEVER_RESTART :
>
> if (rw->rw_crashed_at != 0) { if (rw->rw_worker.bgw_restart_time ==
> BGW_NEVER_RESTART) { ForgetBackgroundWorker(&iter); continue; } .... ....
> That means, it will not remove the worker, and it will be restarted. Now if
> the worker again crashes, postmaster would keep on repeating the crash and
> restart cycle for the whole system.
>
> From what I understand, BGW_NEVER_RESTART applies even to a crashed server.
> But let me know if I am missing anything.
>
> I think we either have to retain the knowledge that the worker has crashed
> using some new field, or else, we should reset the crash time only if it is
> not flagged BGW_NEVER_RESTART.

I think you're right, and I think we should do the second of those.
Thanks for tracking this down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to