Noah Misch escribió: > On Thu, Jun 20, 2013 at 12:33:25PM -0400, Alvaro Herrera wrote: > > MauMau escribi?: > > > Here, "reliable" means that the database server is certainly shut > > > down when pg_ctl returns, not telling a lie that "I shut down the > > > server processes for you, so you do not have to be worried that some > > > postgres process might still remain and write to disk". I suppose > > > reliable shutdown is crucial especially in HA cluster. If pg_ctl > > > stop -mi gets stuck forever when there is an unkillable process (in > > > what situations does this happen? OS bug, or NFS hard mount?), I > > > think the DBA has to notice this situation from the unfinished > > > pg_ctl, investigate the cause, and take corrective action. > > > > So you're suggesting that keeping postmaster up is a useful sign that > > the shutdown is not going well? I'm not really sure about this. What > > do others think? > > It would be valuable for "pg_ctl -w -m immediate stop" to have the property > that an subsequent start attempt will not fail due to the presence of some > backend still attached to shared memory. (Maybe that's true anyway or can be > achieved a better way; I have not investigated.)
Well, the only case where a process that's been SIGKILLed does not go away, as far as I know, is when it is in some uninterruptible sleep due to in-kernel operations that get stuck. Personally I have never seen this happen in any other case than some network filesystem getting disconnected, or a disk that doesn't respond. And whenever the filesystem starts to respond again, the process gets out of its sleep only to die due to the signal. So a subsequent start attempt will either find that the filesystem is not responding, in which case it'll probably fail to work properly anyway (presumably the filesystem corresponds to part of the data directory), or that it has revived in which case the old backends have already gone away. If we leave postmaster running after SIGKILLing its children, the only thing we can do is have it continue to SIGKILL processes continuously every few seconds until they die (or just sit around doing nothing until they all die). I don't think this will have a different effect than postmaster going away trusting the first SIGKILL to do its job eventually. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers