On Mon, 6 Feb 2012 08:31:15 +0100 Raphael Hertzog <hert...@debian.org> wrote:
> If you discover any bug in dpkg's multiarch implementation, please > report it to the BTS (against the version 1.16.2~wipmultiarch). I'd like to ask for some help with a bug which is tripping up my tests with the multiarch-aware dpkg from experimental - #647522 - non-deterministic behaviour of gzip -9n. Some MultiArch: same packages in the archive (libppl9 is the one I came across first) contain .gz files in ./usr/share/doc/ which differ between architectures when, AFAICT, the original/decompressed file does not. i.e. this isn't a bug in libppl9. Strangely, unpacking the .deb, decompressing these files and then recompressing them with gzip -9nf changes the checksum of the .gz file *to match the other architectures*. e.g. the armel package has a bad .gz file, the armhf has a good one. the kfreebsd-amd64 package has a bad .gz file, the amd64 has a good one. If that matrix was flipped diagonally, it might make more sense. ;-) The bad checksums also *match* between armel and kfreebsd-amd64. armel, kfreebsd-amd64: 0e52e84eebf41588865742edaff7b3c0 usr/share/doc/libppl9/CREDITS.gz armhf, i386, amd64: 99e2b9f8972ce00cfe57e3735881015e usr/share/doc/libppl9/CREDITS.gz By bad, I mean that the .gz file, when decompressed and recompressed, changes checksum to match the other architecture. It appears to be a boolean change, not random or Nary. In this case, it also changes the filesize: armel, kfreebsd-amd64: 6344 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz armhf, i386, amd64: 6343 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz (Jakub Wilk originally spotted a checksum change without a filesize change, so filesize is not the best indicator, hence the checksum test.) Decompress and recompress the file from the kfreebsd-amd64 or armel packages on amd64 or armel and the filesize changes back to 6343 and the checksum changes to that of amd64/armhf/i386 etc. making the bug very hard to reproduce. The change does not happen in reverse, neither can I regenerate the .gz file with the original checksum on the architecture which showed the original problem. Once the bad checksum changes to the good one, repeating the compression retains the good checksum. (The .gz file with the changed checksum really is different - it is one byte larger and 3 bytes differ.) I've run the test script for a couple of hundred iterations and the checksum always changes after the first decompress+compress cycle but never changes back. So far, I've tried this on abel.debian.org, inside and outside the sid chroot, and on amd64. Either the armel or kfreebsd-amd64 package can be unpacked and the CREDITS.gz file decompressed and recompressed - the filesize and checksum change to the values seen on armhf and amd64. Can someone spot whether I've made a mess of the test script or whether there is something else going on here? http://people.debian.org/~codehelp/gzip.sh.txt http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=647522 It would be a very laborious task to check the md5sums of every .gz file in /usr/share/doc in every MultiArch: same package across all architectures and the Contents-* files on the mirrors don't contain the filesize of the listed files. Does anyone have ideas on how to scan the archive for this kind of problem? If we can't pin this down, it is going to make MultiArch very hard to deliver - any package build could make some MultiArch combinations uninstallable in ways that are very hard to detect in advance, causing entire dependency chains to fail to install. The manifestation of the issue in libppl9 is clear when trying to install the MultiArch build-dependencies for cross-compilers: $ sudo apt-get install libcloog-ppl-dev:armel Selecting previously unselected package libppl9:armel. (Reading database ... 167711 files and directories currently installed.) Unpacking libppl9:armel (from .../libppl9_0.11.2-6_armel.deb) ... dpkg: error processing /var/cache/apt/archives/libppl9_0.11.2-6_armel.deb (--unpack): './usr/share/doc/libppl9/CREDITS.gz' is different from the same file on the system This then leaves the installation in a broken state and needs careful manual intervention to remove the dependencies of the broken package as `apt-get -f install` wrongly tries to just reinstall the libppl9:armel package again. dpkg is correct in it's current handling - the files really are different. The problem is that the uncompressed file is not. Comment from Paul Effert: > I should add that it's OK (from the point of view of > the RFCs) if gzip produces different outputs given the same > inputs when compressing. The RFCs allow that and presumably > other gzip implementations do that. All that's required is > that compress+decompress result in a copy of the original. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=647522#20 What we're seeing here are differences after decompress+compress but without a reproducible test for this bug, dpkg might have to implement a workaround. I'm wondering if this means that dpkg will have to try and decompress the .gz files in /usr/share/doc to verify if the *contents* are the same before failing to install if the .gz itself differs. With so few packages currently converted to MultiArch: same, it's worrying that the first package I tried hit this bug. -- Neil Williams ============= http://www.linux.codehelp.co.uk/
pgpoD76UPgm1V.pgp
Description: PGP signature