Bug#904558: What should happen when maintscripts fail to restart a service

Tollef Fog Heen Fri, 21 Sep 2018 12:57:41 -0700

]] Wouter Verhelst 

> On Tue, Sep 18, 2018 at 10:04:26PM +0200, Tollef Fog Heen wrote:


[...]

> > The API provided by a package being in the configured state is not
> > whether the relevant daemon is running or not; that is runtime and can
> > and will change many times while the package is in the configured state,
> > so dpkg dependencies are not useful for expressing «this service must be
> > running».
> 
> No. But it *is* a useful way to express "this service must be able to
> run".

That's not what «configured» means, though.  «apt install foo ; rm
/etc/foo.conf» and the package will be in a «running, but can't restart»
state, but also configured in dpkg terms.

> Additionally, if something fails to restart, then that is a serious
> problem that I, as a system administrator, would like to know about.
> Failure to configure a package signals that there is a serious problem
> that I need to fix, so that informs me.

I think monitoring should be implemented using monitoring tools, so if
you actually care if a service is up, you should monitor it rather than
relying on postinsts failing or succeeding.

Alternatively, you could just add «systemctl is-system-running» to a
post-dpkg-invoke hook, it'll tell you if there are daemons that have
failed.

[...]

> There are really only two[1] reasons why a daemon could fail to restart:
> 
> - The maintainer made a mistake in the default configuration, and the
>   user didn't make any changes so the old conffiles are being replaced
>   by the new ones, or the package is being newly installed; now the
>   daemon encounters a syntax error. This is a bug, plain and simple, and
>   catching bugs earlier rather than later is a good idea, which will
>   happen if the daemon restart failure causes a postinst failure.
> - The maintainer made no mistake, but the upgrading user made some local
>   changes, so the conffile system ensures that the syntactic differences
>   in the configuration are not incorporated and the daemon fails to
>   restart. As a system administrator, I would want to know when
>   something like that happens sooner rather than later, so that I can
>   fix it (also sooner rather than later). Failing to finish postinst
>   correctly ensures that that does happen.

In addition to this: Any number of runtime problems.  The disk might be
full.  The service might try to look up a user whose name is in LDAP and
the network is down and thus the user lookup fails.  Some hardware the
service needs is not plugged in or doesn't work correctly.  Data files
are corrupted.  Out of memory.  I'm sure you can come up with more. :-)

This then also ties into what the semantics of «daemon is started»
should be: is it that the service has started, or that it is working?
What should happen if you, on a host with no network connectivity (or
just heavily firewalled), do «apt install ntp»?  Should it wait until
the clock is synced (effectively forever in this case?  Should the
postinst fail until you've fixed the firewall?)?

> [1] There is also the possibility of "the package ships with incomplete
>     configuration on purpose, because there are no sane defaults to use
>     and installing the package requires manual steps from the maintainer
>     before it can be made to work", but (a) our best practices recommend
>     against doing that if at all possible, and (b) in that case starting
>     the daemon shouldn't even be attempted from postinst, and so failure
>     to start can't be a consideration in the exit state of postinst.

You might still want to restart it on upgrade to ensure you don't run
outdated binaries.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are

Bug#904558: What should happen when maintscripts fail to restart a service

Reply via email to