On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek <p...@2ndquadrant.com> wrote: >> What I'm inclined to do is change the logic so that: >> >> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so >> that anything which is still registered gets restarted immediately. > > Yes, that's quite obvious change which I missed completely :).
I've committed the portion of your patch which does this, with pretty extensive changes. I moved the function which resets the crash times to bgworker.c, renamed it, and made it just reset all of the crash times unconditionally; there didn't seem to be any point in skipping the irrelevant ones, so it seemed best to keep things simple. I also moved the call from reaper() where you had it to PostmasterStateMachine(), because the old placement did not work for me when I carried out the following steps: 1. Kill a background worker with a 60-second restart time using SIGTERM, so that it does exit(1). 2. Before it restarts, kill a regular backend with SIGQUIT. Let me know if you think I've got it wrong. >> (2) If a shmem-connected backend fails to release the deadman switch >> or exits with an exit code other than 0 or 1, we crash-and-restart. A >> non-shmem-connected backend never causes a crash-and-restart. > > +1 I did this as a separate commit, e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for ReleasePostmasterChildSlot inside the if statement. Then I realized that was bogus, so I just pushed eee6cf1f337aa488a20e9111df446cdad770e645 to fix that. Hopefully it's OK now. >> (3) When a background worker exits without triggering a >> crash-and-restart, an exit code of precisely 0 causes the worker to be >> unregistered; any other exit code has no special effect, so >> bgw_restart_time controls. > > +1 This isn't done yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers