From: "Alvaro Herrera" <alvhe...@2ndquadrant.com>
I will go with 5 seconds, then.

OK, I agree.


My point is that there is no difference.  For one thing, once we enter
immediate shutdown state, and sigkill has been sent, no further action
is taken.  Postmaster will just sit there indefinitely until processes
are gone.  If we were to make it repeat SIGKILL until they die, that
would be different.  However, repeating SIGKILL is pointless, because it
they didn't die when they first received it, they will still not die
when they receive it second.  Also, if they're in uninterruptible sleep
and don't die, then they will die as soon as they get out of that state;
no further queries will get processed, no further memory access will be
done.  So there's no harm in they remaining there until underlying
storage returns to life, ISTM.

Here, "reliable" means that the database server is certainly shut
down when pg_ctl returns, not telling a lie that "I shut down the
server processes for you, so you do not have to be worried that some
postgres process might still remain and write to disk".  I suppose
reliable shutdown is crucial especially in HA cluster.  If pg_ctl
stop -mi gets stuck forever when there is an unkillable process (in
what situations does this happen? OS bug, or NFS hard mount?), I
think the DBA has to notice this situation from the unfinished
pg_ctl, investigate the cause, and take corrective action.

So you're suggesting that keeping postmaster up is a useful sign that
the shutdown is not going well?  I'm not really sure about this.  What
do others think?

I think you are right, and there is no harm in leaving postgres processes in unkillable state. I'd like to leave the decision to you and/or others.

One concern is that umount would fail in such a situation because postgres has some open files on the filesystem, which is on the shared disk in case of traditional HA cluster. However, STONITH should resolve the problem by terminating the stuck node... I just feel it is strange for umount to fail due to remaining postgres, because pg_ctl stop -mi reported success.

IIRC the only other interesting tweak I did was rename the
SignalAllChildren() function to TerminateChildren().  I did this because
it doesn't really signal all children; syslogger and dead_end backends
are kept around.  So the original name was a bit misleading.  And we
couldn't really name it SignalAlmostAllChildren(), could we ..

I see.  thank you.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to