On 06/05/14 19:05, Robert Haas wrote:
Which brings up another point: the behavior of non-shmem-connected
workers is totally bizarre.  An exit status other than 0 or 1 is not
treated as a crash requiring a restart, but failure to disengage the
deadman switch is still treated as a crash requiring a restart.  Why?
If the workers are not shmem-connected, then no crash requires a
system-wide restart.  Of course, there's the tiny problem that we
aren't actually unmapping shared memory from supposedly non-shmem
connected workers, which is a different bug, but ignoring that for the
moment there's no reason for this logic to be like this.

Agreed.

What I'm inclined to do is change the logic so that:

(1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
that anything which is still registered gets restarted immediately.

Yes, that's quite obvious change which I missed completely :).

(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart.  A
non-shmem-connected backend never causes a crash-and-restart.

+1

(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.

+1


--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to