Bug#904558: What should happen when maintscripts fail to restart a service
Stuart Prescott dijo [Wed, Sep 19, 2018 at 12:18:24PM +1000]: > (...) > That was perhaps also written before we started to realise that maintainer > scripts are actually best avoided as they tend to be complicated, fragile, > difficult to do right and make upgrades harder for the package manager. In > the intervening two decades, we've gone from "maintainer scripts are cool" > to "the best maintainer script is the one that doesn't exist". > > So yes, ignoring errors seems wrong but… > (...) > … causing a snowball of errors in an awkward half-upgraded environment is > nasty. > > The problem comes when you don't yet have the right tools installed to be > able to fix the problem. We see that scenario often enough in #debian where > someone has a failed upgrade and we try to collect more information via > pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover > that the relevant tool isn't installed and because apt is sufficiently > unhappy about broken packages and a half-completed upgrade, you can't ask it > to install the tool at that point in time. > > In the upgrade scenario, while you're trying to fix one particular problem, > you're also in a completely untested half-upgraded situation and so latent > bugs in any number of other tools may also be exposed. > > So while ignoring errors is wrong, so is making it harder to fix them. This > isn't a question of absolutes. I completely agree with Stuart here. Yes, of course, there is a reason for maintainer scripts to exist, and if they fail to set up things around the package, of course, the user _needs_ to know something is off in their system. But that should happen _very_ seldom. As Stuart says, helping non-technical users out of this situation can be quite hard, and quite discouraging for the user. We have to make sure the scripts are as foolproof as possible — and failing to stop or restart a daemon it should _never_ cause the system to enter such a state. signature.asc Description: PGP signature
Bug#904558: What should happen when maintscripts fail to restart a service
Ian Jackson wrote: >> I personally think that it would make sense for the policy to at least >> recommend what should happen with regards to maintainer scripts and >> typical operations that are performed in them. > > There is already a section on error handling in scripts, which (IMO > correctly) says that shell scripts should use set -e. > > When I wrote that, it didn't occur to me that anyone would think that > a failure by a postinst script to perform an intended operation should > be treated any other way than a failure of the postinst script. That was perhaps also written before we started to realise that maintainer scripts are actually best avoided as they tend to be complicated, fragile, difficult to do right and make upgrades harder for the package manager. In the intervening two decades, we've gone from "maintainer scripts are cool" to "the best maintainer script is the one that doesn't exist". So yes, ignoring errors seems wrong but… >> And, while I'm open to be convinced otherwise, I don't see any benefit >> from postinst (particularly postinst + configure) ever failing. > > Frankly I'm disturbed to be reading this, here. See above. > > If the postinst fails, then the user has the opportunity to fix the > root cause and rerun dpkg-source --configure --pending. That will > then repair the system completely. … causing a snowball of errors in an awkward half-upgraded environment is nasty. The problem comes when you don't yet have the right tools installed to be able to fix the problem. We see that scenario often enough in #debian where someone has a failed upgrade and we try to collect more information via pastebinit, strace, traceroute, netcat, gdb, etc; we frequently discover that the relevant tool isn't installed and because apt is sufficiently unhappy about broken packages and a half-completed upgrade, you can't ask it to install the tool at that point in time. In the upgrade scenario, while you're trying to fix one particular problem, you're also in a completely untested half-upgraded situation and so latent bugs in any number of other tools may also be exposed. So while ignoring errors is wrong, so is making it harder to fix them. This isn't a question of absolutes. cheers Stuart -- Stuart Prescotthttp://www.nanonanonano.net/ stu...@nanonanonano.net Debian Developer http://www.debian.org/ stu...@debian.org GPG fingerprint90E2 D2C1 AD14 6A1B 7EBB 891D BBC1 7EBB 1396 F2F7
Bug#904558: What should happen when maintscripts fail to restart a service
]] Ian Jackson Hi, > There may be good reasons not to treat daemon startup failure as a > postinst failure, but the argument above is not one of them. I think this is the core question. I largely agree with Ian here that having postinsts fail is not that big a deal if they can't make forward progress, but also we're being asked to advice on what happens when a maintainer script fails to restart a service. I disagree with him on whether failure to start/restart a service should be considered a configuration failure. The API provided by a package being in the configured state is not whether the relevant daemon is running or not; that is runtime and can and will change many times while the package is in the configured state, so dpkg dependencies are not useful for expressing «this service must be running». (There's also the case where the service is running on a separate host, which is often the case for services such as databases and where the use of Depends is inappropriate.) I think the general rule should be that the success/failure of the postinst script should signal whether the package considers itself ready to provide whatever API it exists to provide (disregarding the case of Essential packages here, since those are special). This means that failure to start a daemon should generally not cause the postinst to fail. At the same time, I think there are exceptions to this rule that should be left to maintainer judgement: sshd comes to mind as a service where if it can't restart, you want the system to make it very clear that something is wrong that you might want to fix sooner rather than later (since failure to do so can lead to you not being able to access it after a reboot). -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Bug#904302: Whether vendor-specific patch series should be permitted in the archive
]] Philip Hands > Tollef Fog Heen writes: > > >This should be implemented in Debian Policy by declaring that a a >^^^ > You've this doubled 'a' on two occasions in this text. I'll fix that, thanks for spotting it. > Presumaly we would not want to see new packages adopting the use of > vendor-specific patch series prior to Buster. > > Do we need to make the "SHOULD NOT" conditional on the package already > having a vendor-specific patch series at the time of this resolution? I think that just adds needless complexity and assumes that maintainers will want to add bugs to their package. I really hope that's not the case, so I don't think it's worthwhile to add extra language for it. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are
Bug#904558: What should happen when maintscripts fail to restart a service
Margarita Manterola writes ("Bug#904558: What should happen when maintscripts fail to restart a service"): > Sorry that it took so long to get back to this bug. The other bug took > all the attention. ... > If a postinst fails (for whatever reason), the package is left in a > broken state (Failed-Config) which in general makes the package > management system unhappy. The other effect is that the package's dependencies are not configured, so their postinsts do not experience a broken situation. > It seems that the only reason why one may want to do this is to call > the attention of the sysadmin so that they can solve the problem. > However, in a world where a large number of users are running automatic > updates, leaving the package management system in a broken state is > pretty sad, not very visible and rather confusing for the user when > they finally encounter it. > > Is there an another use case for leaving the package in Failed-Config > that we missed? If you deliberately cause the postinst to succeed when the package is nonfunctional, then the package's r-dependencies will be configured (ie have their postinsts run) in the broken state. The r-dependencies' postinsts may then do wrong things. They may leave the r-dependencies in anomalous states. If one takes the argument you make above to its logical conclusion, all those postinsts should also report success. The result is system where the only thing that is happy is the package management systme, and the records of the root cause of the problem, and how the failed operations might be reattempted, have been lost. I guess you will infer from what I write above that "reporting errors causes the next layer to be unhappy", and "reporting errors causes the user to be unhappy" to be extraordinarily bad arguments. There may be good reasons not to treat daemon startup failure as a postinst failure, but the argument above is not one of them. > It's unclear why the service (re)start needs to be a special case. Service (re)starts are more likely to fail for unrelated reasons. Also some packages are able to provide much of their intended API even without the daemon. I think the general rule of thumb should be that a daemon startup failure should be treated as a configuration failure. I'm content with a situation where maintainers Feel free to diverge from this if there are reasons to do so. > I personally think that it would make sense for the policy to at least > recommend what should happen with regards to maintainer scripts and > typical operations that are performed in them. There is already a section on error handling in scripts, which (IMO correctly) says that shell scripts should use set -e. When I wrote that, it didn't occur to me that anyone would think that a failure by a postinst script to perform an intended operation should be treated any other way than a failure of the postinst script. (In the usual case. There are of course lots of situations where the right approach is some kind of error recovery, or the operation was attempted "just in case", or something, in which case more subtle error handling is called for.) > And, while I'm open to be convinced otherwise, I don't see any benefit > from postinst (particularly postinst + configure) ever failing. Frankly I'm disturbed to be reading this, here. See above. If the postinst fails, then the user has the opportunity to fix the root cause and rerun dpkg-source --configure --pending. That will then repair the system completely. Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Next Meeting - Wednesday, September 19th 19:00 UTC (tomorrow)
Dear Technical Committee members, Our monthly meeting will take place tomorrow at 19:00 UTC. These are the items in the agenda (also committed to git): * Review of previous meeting AIs * #904302 Whether vendor-specific patch series should be permitted in the archive * #904558 What should happen when maintscripts fail to restart a service * Additional Business See you there! -- Regards, Marga