Hi, On Wed, Oct 17, 2018 at 09:47:57PM +0100, Simon McVittie wrote: > However, it leaves the default as "fail hard", which I'm not convinced > is the most appropriate thing for systems that lack an experienced > sysadmin (which are the systems where defaults matter most, because an > inexperienced user is the least able to make an informed decision about > where they should deviate from defaults).
I think that's where we disagree, so allow me to focus on that. I think everyone would agree that when a service fails to (re)start upon package installation or upgrade, that there is a problem and that this problem needs to be reported in whatever way is most appropriate (if not, we have a bigger disagreement than I thought and we need to take a step back ;-) The question that remains is "how". Currently, Debian has four ways of informing a system administrator of such failures: - Log a message to stdout and/or stderr. This is liable to scroll by unnoticed, and therefore is not a reliable way to inform the system administrator. For that reason, I don't think it's a good idea. - Log a message to syslog and/or the systemd journal. This will not scroll by, but relies on the system administrator to actively hunt for problems in system logs, which they probably won't do unless and until they notice that the daemon isn't running anymore (and by that time it may be too late). - Produce a debconf error note. This is mildly better than the above two, since debconf error notes are shown at highest priority, and therefore will only be hidden if debconf is configured to be noninteractive; in that case, debconf will send an email to root. On systems without a configured MTA, this will not help; and for daemons where failure to restart is a catastrophic that needs to be resolved ASAP, such as sshd, this might not be desirable. - Exit from postinst with nonzero exit state. This is unlikely to be missed by system administrators; however, it has several disadvantages that were pointed out by other people during this discussion. I think it is perfectly fine to have the TC say that "failures to restart a service must be reported, either by exiting nonzero, or by another appropriate action", without going in detail what those other actions could be. > policy-rc.d also has some practical integration issues. It normally relies > on putting an unpackaged file in /usr/sbin (unless you have installed > policyrcd-script-zg2), and it's common for tools like debootstrap and > debian-installer to create and delete policy-rc.d to suppress service > startup while carrying out bootstrap operations. One Debian derivative > that I'm involved in (SteamOS) is *meant* to have a policy-rc.d, but we > recently discovered that it has always been deleted at the end of the > debian-installer run, and so doesn't exist in practice. I think that problem is not something that should be resolved by this discussion. I'll readily admit that I did not actually test any of the suggestions I made wrt policy-rc.d. There are other issues with it too; I'm thinking of filing a wishlist bug to have it replaced by something better. On top of that, policy-rc.d has alwyas irked me as a bit of an awkward interface; it is the only type of Debian-specific configuration that does not go into /etc, but for which you need to write a script in /usr/sbin. This is confusing, as shown by debian-installer removing it unconditionally. In an ideal world, the policies currently implementable through policy-rc.d should be configuration snippets in a run-parts style directory. The "just drop a script somewhere" idea is a poorly-defined interface which is inflexible and inappropriate for the purpose of a distribution, but "policy-rc.d should be replaced by something better" is not an appropriate response to the question "what should happen when a service fails to restart in postinst". Also related to this problem is what happens with postinst failing for other reasons than "the daemon doesn't restart". While that is probably the most likely reason for postinst failures today, it is by no means the only one; so if you say "postinst failing because of daemon restart failing" is something that should not ever happen, I think you should then also make guidelines as to when, exactly, a postinst should be allowed to fail (and muck up the whole system). -- To the thief who stole my anti-depressants: I hope you're happy -- seen somewhere on the Internet on a photo of a billboard