Bill Cole wrote: > At 10:20 PM +0400 5/14/08, Eugene wrote: >>Hi people, >> >>>From: Adam McDougall <[EMAIL PROTECTED]> >>>I would just like to mention a circumstance that happened to me this >>>Sunday. We had a total power outage in our building, longer than our >>>UPS's could last and we don't have a generator for servers (nor is it >>>economical or needed). When the power came back on, my local NTP server >>>came on at the same time as my mail servers, as well a majority of my >>>other servers. My servers tried to step their time to be in sync with >>>my local NTP server, which was still busy trying to sync itself with >>>outside sources, which takes a while, so my mail servers did not get an >>>answer. Later, dovecot died because the time finally synced, and I >>>found out why pretty quick (have seen this before) but this was an >>>unusual situation. >>> >>>My point is, we had an unusual circumstance, and even though I've taken >>>steps to have my mail servers sync their time at boot and run ntpd >>>afterwards, there are some circumstances in which this is not enough, >>>and dovecot still died. Its not always because someone was lazy about >>>their time setup. >> >>My point exactly. It's amazing how some people are quick to ramble >>about someone else's administrative incompetence without taking time >>to read the situation. > > I most certainly did read your description of the situation, and my > use of the phrase "administrative incompetence" should not be taken > personally. I did not say (or mean) "administrator incompetence" and > would not try to make that sort of judgment at a distance. > >> (One person even suggested hacking the dovecot startup script to >>run ntpdate -- useless as ntpd already occupies the ports). > > That's one of the things that "ntpdate -u" is good for. > > >>Fact is, ntpd can take unpredictable delay before the initial >>time-step. Delay that can't be controlled, and it would be >>unreasonable to delay starting mail services until it is guaranteed >>to complete. Then, dovecot dies, and admin (who is not always >>immediately available) has to start it manually anyway (especially >>as it is not clear what to do with possibly unsynced timestamps) -- >>only after the unnecessary downtime. > > Or you can have an external watchdog that re-launches Dovecot if it > dies. This approach handles a broader set of failure modes and on > some OS's is a built-in feature of the startup subsystem. > > Because of the fact that Dovecot may be running in an environment > with an external watchdog, perhaps one like launchd or classical > SysV/Solaris init that can catch the exit of the process it spawned > and use it to trigger an immediate respawn. This means that adding an > internal respawn inside Dovecot that will not cause breakage on any > system is not as simple as it may seem. > >>So, the question is: why on earth can't we add a single line of code >>to dovecot to restart itself after terminating? > > You can do just that yourself if you believe that it is the best > option for your circumstances and adequate to handle the problem you > are having. One line of code might well do the trick you want on your > system. If Timo puts the functionality in the code he distributes, it > will need to be a great deal more than one line of code. >
Problem I see is that an external script that *unconditionally* relaunches dovecot could be a terribly problem. It's better for dovecot to do it itself in this particular failure, because it's the only one who knows that it was just a date issue, and relaunching is safe.