Subject: Thoughts on Debian quality, including automated testing [ I'm subscribed to -devel, no Cc required. I apologize for the length, but it's only a bit over 3000 words. I hope the section titles help, if you want to skip parts. ]
For some time now I have been thinking about ways to make Debian better from a technical point of view. Most of my actual efforts have gone into writing piuparts[1], running it against packages in the archive, and reporting any problems. I have also spent some time to think about related issues. [1] http://packages.debian.org/unstable/devel/piuparts This mail is primarily prompted by Ian Jackson's proposal[2] to specify a framework for automated testing. I meant to write and send it weeks ago, but for various reasons, I kept postponing finishing it. Sorry about that. Part of the reason is that as I kept thinking and writing about this, I kept expanding the scope. As a result, this mail is quite long. Sorry about that, too. Further, I keep mentioning piuparts in this mail. Sorry if it seems like I'm advertising it. I'll start by saying that I fully support writing automated tests for Debian packages. Automated tests can do very good things to development of single programs. They can also do so to Debian packages, and Debian as a whole. [2] http://lists.debian.org/debian-project/2005/11/msg00073.html and other threads Before I get down to details, I'd like to be a bit philosophical and preachy. You may want to skip a few paragraphs. The quality of Debian is not bad at all. Debian works quite well for a large number of people, and we get fairly few bug reports from them relative to the number of programs we have packaged. That's pretty much the only objective criterion we can currently use to determine real quality. Quality is sometimes hard to define. I claim that "package has few bug reports in proportion to its user base" is one important indicator of high quality. Still, we could do much better. Our two best known quality assurance tools, lintian and linda, are obviously not used by a lot of package maintainers[4], given the number of packages that have problems. Consider, for example, lintian's test for zero-byte files in the doc directory[5]. There are a hundred packages that fail that test. Yet the problem is really utterly simple to fix. [4] http://lintian.debian.org/reports/tags.html [5] http://lintian.debian.org/reports/Tzero-byte-file-in-doc-directory.html These zero-byte files are not a "real" problem: they use up an inode, and make people spend a few seconds extra when looking for information about the package, but they don't actually break anything. Yet the packages in question would be of higher quality if they were non-empty or didn't exist at all. They may also indicate other sloppiness, which may or may not be caught with automatic tools. Sloppiness tends to result in real problems sooner or later. I propose "respected automated tools find few problems" as the second indicator of quality. To improve the quality of Debian, we need to do several things: A) Prevent bugs from happening in the first place B) Find and report bugs C) Fix bugs that have been reported D) Prevent bugs from entering the archive I will now discuss each of these things. After that I'll finally get to discussing automated testing the way Ian Jackson proposed it. A) Prevent bugs from happening in the first place ================================================= In general, the way to prevent bugs from happening at all is by reducing complexity. Simple things are easier to get right. Most programmers find that using tools with higher abstraction levels reduces complexity and the number of bugs they create for a given task. As an example, writing the shell command "cat *.txt > all.dat" much more likely to work correctly than writing the same program in C, where you would have to open and read and write files yourself, checking for errors, etc. In a Debian packaging context, this might mean using packaging helpers to take care of the boring, repetitive chores that are the same from one package to the next. For example, debhelper is pretty good at reducing the debian/rules file to just a handful of simple invocations of the individual debhelper tool programs. Each invocation is very simple. Very simple bits of code are usually correct. Result: fewer bugs in packages and in Debian. Debhelper is a very good thing indeed. I'm not saying that using debhelper, or another packaging helper, should be mandatory. They are merely one way of reducing the probability of bugs, and because of that, I am happy that most packages in Debian do use them. On the whole, our quality is better thanks to that. If you can make bug free packages, by all means don't use a helper (I don't, my packages are simple enough as they are). There are other ways of combating complexity. Picking sensible defaults and not making the package configurable via debconf is simple. I'd add more examples, but I can't think of any right now (and people on IRC are discussing dressing pork in yellow underwear, which is highly distracting). Where we can, we should avoid complexity and make things simpler. If you have ideas for how to do this, please tell. B) Find and report bugs ======================= We currently have about 82 thousand bugs open (counting from the BTS summary page on packages; this includes all severities and tags). That's a lot of bugs. There are, however, about 11 thousand packages with bugs, so there are only about 7.3 bugs open per package, on average. Our new bug numbers are in the 340 thousand range, so we've closed 258 thousand reported bugs (plus a lot of unreported ones) over the years. That's a truly huge number of bugs. A bug report is a wonderful thing. It means that there is no longer any need to wonder whether there is something wrong in the package: you know there is, and you know what it is. Someone, a nice person using your package, has gone to the trouble of finding out what the problem is, and also they decided to tell you. It is delightful when people decide to be so helpful. Sometimes it takes a long time for anyone to report a bug. It would be more reassuring to be more proactive in finding bugs. Our two well known tools for this are lintian and linda. They examine packages for patterns that tend to indicate problems. On the whole, they are very simple systems, but even so, they find a lot of problems. Most problems probably never enter the Debian archive, because packagers use the tools and fix anything that needs fixing before uploading. The tool I wrote, piuparts, is similar: it should be used by the package maintainer before uploading, so that a buggy package never enters the archive. Piuparts is pretty new, so it's unsurprising that most people don't use it yet. C) Fix bugs that have been reported =================================== Not all of the bugs the automatic tools find are fixed, however. And there's those 82 thousand other bugs that need fixing as well. While many of them are wish list bugs, or it is questionable if they are bugs at all, they do all require some attention. What can we do about that? Can we get down to no (fixable) bugs in a stable release? (Fixable bug here means a bug that can be fixed at all with reasonable effort by the Debian package maintainer. Having to rewrite the X server doesn't count as reasonable effort. Wish list bugs should probably be excluded as well.) Having no bugs is a good state to be in. When a bug is reported, it is usually easier to fix if there aren't a bunch of other bugs disturbing the process. Our Bug Squashing Parties are useful, but they mostly concentrate on release critical bugs. Other bugs get less attention, but needs to be fixed too. I'm guilty of ignoring many of the bugs against my own packages for extended periods of time. People like me are part of the problem. Several ideas have been floating around for years on how to improve this situation, of which I'd like to mention three. While I've here used the number of bugs as the measure of a package's quality, the same ideas might help with other aspects, like getting new upstream versions packaged soon after they're released. * Team maintenance. If a package is maintained by a team, there are more people sharing the work. When a team works well, more people look at the package, and finding and fixing problems is more effective. There is less work per person, so things don't lag as much. A well-working team is a good thing. As an example, the Debian GNOME team seems to work really well. Transitions to the next upstream version happen quite smoothly these days. Mandatory teams for packages seems ridiculous to me. Lots of packages are so small that having to arrange a team for them, even if it is only the effort to set up and subscribe to a team mailing list, is wasteful. Not everyone likes to work in a close team, either, and we shouldn't exclude them. * Less strong ownership of packages. The current state in Debian is that the package maintainer (or maintainer team) owns the package, and as long as they don't cause a lot of trouble, and don't have release critical bugs, everyone else is invited to keep their hands off. If the maintainer, for whatever reason, can't keep the quality of the package up, it will have to degrade a lot before anyone else dares to touch it. If this Non-Maintainer Upload threshold were lowered, it might be that quality could improve. There would probably be a number of mistakes made, but that also happens when people take on a new package. This is not the same as maintenance by a team. An NMU is done by someone interested in the package for whatever reason, but they only do the upload to fix a specific problem or problems, not to join the maintainer team for a long time. This idea hasn't been tested. It could be tested if some group of maintainers declared that some or all of their packages were part of the experiment, that anyone could NMU them for any reason whatsoever, as long as they take proper care not to mess the package up. (I'm willing to participate in such an experiment myself, but I haven't thought out the details yet.) * Abolishing package ownership completely. This is a more radical version of the previous one. I'm not going to argue for it until the milder form has been tested first. The main theme here is the need for speed: bugs need to be closed faster, and things that help this would be good. D) Prevent bugs from entering the archive ========================================= In program development, it is usually a good idea to not commit anything until it passes all automatic tests. Similarly, I propose that it would be good for Debian to use some of the automatic tools before a package is accepted into the archive. For example, if lintian finds the init.d-script-does-not-implement-required-option error[6], is there any reason to accept the package, since it is certainly buggy? [6] http://lintian.debian.org/reports/Tinit.d-script-does-not-implement-required-option.html There are some practical issues with this, of course. Not all lintian and linda warnings should prevent accepting a package, because that might prevent fixing more important problems quickly. That's fine-tuning, however, the general principle still applies: it's better to prevent a buggy package from entering than fixing it later. Some of the automatic checking might be too heavy, or too risky, or otherwise impractical to run when processing incoming packages. In these cases, it is better to accept the package into the archive and then run tests later, and file bugs for any problems found. Lintian has been run on all packages for many years. The results are listed on a website[7], but many packages go for months, even years without fixing the problems. When I started running piuparts on all packages, I decided to report any problems as bugs, instead of just publishing log files. This seems to work better: many of the bugs have been fixed, some of them even in the same day. [7] http://lintian.debian.org/ Automated testing of program functionality ========================================== I'm speaking here about whether the programs in the package work, not whether the packaging itself works. Lintian, linda, and piuparts already test the packaging fairly well, I think. Also, I'm speaking about active tests that require running programs in the package; lintian and linda don't do that, and shouldn't. In this section especially I'm partly rephrasing what Ian Jackson and others said in the previous discussion (or what I think they said), partly adding my own thoughts. Having a way to automatically test that a package is at least minimally functional is clearly a good thing. Speaking from the point of view of someone who occasionally does NMUs to fix release critical bugs in other people's packages, the easier it is to check that a package still works after I've mucked about with it, the easier things will be for everyone involved. Automatic testing needs to happen in various contexts: * When the package is being built. Most of such tests should go into an upstream test module. Traditionally, this would correspond to "make check". * When the package has been built, but before it is uploaded. This is similar to testing with lintian, linda, and piuparts. The difference from build-time tests is that the tests are run when the package is installed onto a system (possibly a chroot or a virtual system). * Before an uploaded package is accepted into the archive. This would prevent buggy packages from entering the archive. * On specifically crafted test systems. This would check that packages still work even though other packages they depend on have changed. * On real systems, to verify that things still work. This would potentially be a big help to system administrators. Some issues: * Test data. Some tests are going to require a large amount of test data and that is best kept out of the binary package. It is probably best to keep it in the source package only: the test program then needs to install (and maybe partially build) the source package. * Test dependencies. Many tests will require using tools that neither using nor building the package needs. Thus we probably need "Tests-Depends" (for the source package). * Generic tests. Since Debian has so many packages, as much as possible should be tested using generic tests that apply to many packages. For example, checking that an executable can be run at all should be a generic test. Expecting ten thousand source packages to add tests for that is unwarranted optimism. Generic tests should require nothing from the package itself. * Specific tests. Obviously it is also necessary for each package to be able to provide tests for its particular peculiarities, such as instances of old bugs to avoid them re-appearing. The interface for this should allow various tools to be used for implementing the tests, so that there is space for evolution of good tools. Compare the situation with a raw debian/rules file and helper packages. * Non-burdening of buildds. Especially slow architectures might want to skip build-time tests to save time. There should be a way to build the package without running tests, and of course to run the tests only. My concrete proposals: * Let's write a tool that can do at least simple generic tests (we'll expand it later). * Let's standardize on a way to invoke package specific tests: "debian/rules test-build" for build-time tests and "debian/rules test-install" for tests of the installed package. Neither must require the package to be built already. The rules targets can only be assumed to exist if debian/control contains a "Tests-Depends". Whoever calls these must take care of installing test-dependencies. * Let's modify pbuilder to run test-build tests and (if possible) also the generic tool and test-install tests. These belong, I think, better into pbuilder then piuparts, but it might be that piuparts should run them also. * Let's also write a tool that a sysadmin (or tester) can use to run test-install for particular packages, or all installed packages. * Ian's proposed debian/tests/control interface sounds nifty. I'm not going to debate the exact details (at least here), they should probably be decided by the implementer ("those who, decide"). It should be implementable as a tool that can be run from "debian/rules test-install", I think. This will allow Debian packagers who like it to use it, but those who prefer something else can use that instead. Some people might want to use all available tools, even. * After all this is done, let's start a campaign where every bug fix includes a patch to add a regression test for it. Let's take quality assurance seriously ====================================== Quality assurance is currently performed by a few people organized around the debian-qa mailing list, and various other people (including me). I see the need for a more aggressive, systematic approach to quality assurance. This might be implemented by (re-)forming the debian-qa team with a modified agenda, something like this: The task of the debian-qa team is to proactively find and fix (technical) problems in Debian packages, and to temporarily maintain orphaned packages. Some of the things that it might make sense to organize better include (if these already are organized well, I apologize): * Reporting serious problems found by lintian/linda as bugs against packages. * Reporting problems found by piuparts. I already do this, but it would be good to expand it to a couple more architectures (at least), and to have more people process logs of failed tests. * Testing that all packages that are of optional or higher priority actually can be installed at the same time. (This should be partly doable by analyzing Contents files, if it isn't already.) * Testing that all packages can still be re-built even when compilers, libraries, or other build dependencies have changed. * Checking that a system with as many packages as possible installed can be upgraded from stable via testing to sid. Then, of course, there is the fixing of bugs, but I've discussed that above. PS. Sorry again, my foot note numbering got confused. -- On a clear disk, you seek forever. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]