On Tue, 07 Feb 2012 19:11:16 +0100 Michael Biebl <bi...@debian.org> wrote:
> On 07.02.2012 18:07, Joey Hess wrote: > > Neil Williams wrote: > >> I'd like to ask for some help with a bug which is tripping up my tests > >> with the multiarch-aware dpkg from experimental - #647522 - > >> non-deterministic behaviour of gzip -9n. > > > > pristine-tar hat tricks[1] aside, none of gzip, bzip2, xz are required > > to always produce the same compressed file for a given input file, and I > > can tell you from experience that there is a wide amount of variation. If > > multiarch requires this, then its design is at worst broken, and at > > best, there will be a lot of coordination pain every time there is a > > new/different version of any of these that happens to compress slightly > > differently. Exactly. I'm not convinced that this is fixable at the gzip level, nor is it likely to be fixable by the trauma of changing from gzip to something else. That would be pointless. What matters, to me, is that package installations do not fail somewhere down the dependency chain in ways which are difficult to fix. Compression is used to save space, not to provide unique identification of file contents. As it is now clear that the compression is getting in the way of dealing with files which are (in terms of their actual *usable* content) identical, then the compression needs to be taken out of the comparison operation. Where the checksum matches that's all well and good (problems with md5sum collisions aside), where it does not match then dpkg cannot deem that the files conflict without creating a checksum based on the decompressed content of the two files. A checksum failure of a compressed file is clearly unreliable and will generate dozens of unreproducible bugs. MultiArch has many benefits but saving space is not why MultiArch exists and systems which will use MultiArch in anger will not be likely to be short of either RAM or swap space. Yes, the machines which are *targeted* by the builds which occur as a result of having MultiArch available for Emdebian will definitely be aimed at "low resource" devices but those devices do NOT need to actually use MultiArch themselves. In the parlance of --build, --host and autotools, MultiArch is a build tool, not a host mechanism. If you've got the resources to cross-build something, you have the resources to checksum the decompressed content of some files. As far as having MultiArch to install non-free i386 on amd64, it is less of a problem simply because the number of packages installed as MultiArch packages is likely to be a lot less. Even so, although the likelihood drops, the effect of one of these collisions getting through is the same. > This seems to be a rather common problem as evindenced by e.g. > > https://bugs.launchpad.net/ubuntu/+source/clutter-1.0/+bug/901522 > https://bugs.launchpad.net/ubuntu/+source/libtasn1-3/+bug/889303 > https://bugs.launchpad.net/ubuntu/oneiric/+source/pam/+bug/871083 See the number of .gz files in this list: http://people.debian.org/~jwilk/multi-arch/same-md5sums.txt > In Ubuntu they started to work-around that by excluding random files > from being compressed. So far I refused to add those hacks to the Debian > package as this needs to be addressed properly. Maybe the way to solve this properly is to remove compression from the uniqueness check - compare the contents of the file in memory after decompression. Yes, it will take longer but it is only needed when the md5sum (which already exists) doesn't match. The core problem is that the times when the md5sum of the compressed file won't match are unpredictable. No workaround is going to be reliable because there is no apparent logic to the files which become affected and any file which was affected at libfoo0_1.2.3 could well be completely blameless in libfoo0_1.2.3+b1. (binNMU's aren't the answer either because that could just as easily transfer the bug from libfoo0 to libfoo-dev and so on.) There appears to be plenty of evidence that checksums of compressed files are only useful until the checksums fail to match, at which point I think dpkg will just have to fall back to decompressing the contents in RAM / swap and doing a fresh checksum on the contents of each contentious compressed file. If the checksums of the contents match, the compressed file on the filesystem wins. Anything else and Debian loses all the reproducibility which is so important to developers and users. When I need to make a cross-building chroot from unstable (or write a tool for others to create such chroots), it can't randomly fail today, work tomorrow and fail with some other package the day after. If others agree, I think that bug #647522, currently open against gzip, could be reassigned to dpkg and retitled to not rely on checksums for compressed files when determining MultiArch file collisions. -- Neil Williams ============= http://www.linux.codehelp.co.uk/
pgptj0LtSC5J0.pgp
Description: PGP signature