Bug#1002056: Supporting alternative zlib implementations
* Sebastian Andrzej Siewior [2024-10-03 22:03]: > On 2024-09-26 01:35:45 [+0200], Fay Stegerman wrote: > > For example, ZIP files or Android APKs built on a Debian system will have a > > different compressed stream, like the test files you mention. Which will > > likely > > break Reproducible Builds tooling like apksigcopier [1] and > > reproducible-apk-tools [2]. > > wouldn't it work to compare the decompressed stream? Is an identical ZIP > file a requirement? By definition a Reproducible Build means a bit-by-bit identical APK, including the signature (which is why I built a tool to extract an existing signature and use it as a build input instead of the private key). Which means you need identical compressed data for Reproducible Builds. Having identical uncompressed data gets you pretty close to the goals of RB, but unpacking and/or skipping over signatures is very very hard to get right and simply cannot provide the same guarantees as having two bitwise identical files. And it's impossible to create an APK you can actually install if it's not bit-by-bit identical as the signature would not be valid otherwise. So yes, unfortunately an identical ZIP file is a requirement and comparing the decompressed stream not an option, which is why this kind of change is not something we can just consider an implementation detail or work around. I wrote more about the very messy situation Fedora's switch to zlib-ng already created for Android Reproducible Builds [1]. Which likely would have broken a lot more reproducible Android apps already if Fedora's OpenJDK packages linked against the system zlib like Debian's OpenJDK packages do (instead of using an embedded copy of regular zlib). - Fay [1] https://lists.reproducible-builds.org/pipermail/rb-general/2024-September/003547.html
Bug#1002056: Supporting alternative zlib implementations
* Guillem Jover [2024-09-25 01:55]: > Hi! > > On Wed, 2024-09-25 at 00:39:10 +0200, Fay Stegerman wrote: > > * Guillem Jover [2024-09-24 17:45]: > > > Personally, I think fully migrating from zlib to zlib-ng would sound > > > great (even for trixie), but I guess we can take it slow if you do not > > > feel confident or have concerns over this. > > > > As using an alternative zlib implementation could impact Reproducible Builds > > [1], I would recommend taking that into consideration before deciding on > > this > > kind of change. > > Ah, this is related to something I wanted to mention too and forgot. > > I don't think the specific case you mention is in itself a concern for > Debian, because we only guarantee reproducibility given the same inputs, > which includes the set of packages and their versions that were used > when building the binaries. So if there was a switch, those would end up > being recorded as well, and used when reproducing the outputs. And this > could also happen with a newer version of zlib itself. > > The problem though is, that because the compressed stream is going to > change, that can make certain test suites fail if we perform this > switch, which I think would be the main fallout that we'd see from > this and would need manual fixing, although I assume Fedora has probably > handled most of these already. For example when I added explicit > zlib-ng support to dpkg, I had to fix its test suite to parametrize > sizes for test artifacts. Whilst it indeed may not affect the reproducibility guarantees for Debian packages themselves, it does affect being able to use a Debian system for Reproducible Builds of other software for which the reference artefacts were built with regular zlib and thus can no longer be reproduced on Debian if that uses a different zlib implementation (so far I've only encountered the reverse, which seems relatively rare -- for now). For example, ZIP files or Android APKs built on a Debian system will have a different compressed stream, like the test files you mention. Which will likely break Reproducible Builds tooling like apksigcopier [1] and reproducible-apk-tools [2]. AFAIK all rebuilders (including my own [3]) for Android APKs use Debian base systems, so this could cause quite a bit of breakage for Reproducible Builds within that ecosystem, which is something I would like to avoid (or at least have a decent workaround for -- e.g. being able to easily choose between multiple zlib implementations during runtime in my Python tooling would be great). As you point out, we've been lucky that zlib has remained backwards-compatible for a long time (even though it doesn't provide any guarantees of that AFAIK). Which also makes me wonder how much more likely zlib-ng might be to produce different compressed streams between different versions or using different hardware (configurations). There might also be issues with reproducibility of Debian packages themselves if e.g. zlib-ng output can differ on different hardware (e.g. number of cores) even with an otherwise identical build environment. At the very least I think it would be good to know how all this could be affected (and how likely things are to remain as stable as zlib has been so far) before making a decision to switch. > I think it would be pretty easy to at least see the extent of this > fallout by performing a mass rebuild for packages build-depending > on zlib1g-dev with a zlib-ng version. - Fay [1] https://tracker.debian.org/pkg/apksigcopier [2] https://github.com/obfusk/reproducible-apk-tools [3] https://github.com/obfusk/rbtlog
Bug#1002056: Supporting alternative zlib implementations
* Guillem Jover [2024-09-24 17:45]: [...] > Personally, I think fully migrating from zlib to zlib-ng would sound > great (even for trixie), but I guess we can take it slow if you do not > feel confident or have concerns over this. As using an alternative zlib implementation could impact Reproducible Builds [1], I would recommend taking that into consideration before deciding on this kind of change. - Fay [1] https://lists.reproducible-builds.org/pipermail/rb-general/2024-September/003526.html
Bug#1078883: diffoscope: FTBFS: failing tests
* Fay Stegerman [2024-08-19 15:40]: > * Chris Lamb [2024-08-19 12:37]: > > Santiago Vila wrote: > > > > > E zipfile.BadZipFile: Overlapped entries: 'dir/text' (possible zip bomb) > > > > (FAO to help folks joining the thread: this was from a rebuild of > > stable, not of unstable. This bug does not affect unstable nor > > testing.) > > > > So, this is essentially the same issue as #1068705, which we believe > > was caused by a regression in CPython [0] which was, in turn, caused > > by an attempt to make the handling of .zip file safer [1]. > > > > We worked around this in diffoscope by catching the exception [2]. > > Then, we added a visible user note that we had done so [3]. > > > > I think we have four options: > > > > 1. Revert the security-related changes in CPython. > > 2. Write and apply a patch to CPython to fix the CPython regression. > > 3. Backport the two patches (or just the second [2]) to stable. > > 4. Do nothing and accept that diffoscope FTBFS in stable. > > > > (1) is pretty much a no-go, and then I don't think a patch to (2) will > > be forthcoming as I lack the confidence to safely write one. And (4) > > only works if we think that someone will effect (2) for us, will be > > backported by the CPython devs _and_ it will land in bookworm soon. A > > tall order. > > > > (3) is thus probably the best plan. The first (or both) of the > > linked changes [2][3] could straightforwardly and safely be backported > > to stable… if folks that it is justified. Let me know. > > > > [0] https://github.com/python/cpython/issues/117779 > > [1] https://github.com/python/cpython/pull/110016 > > [2] > > https://salsa.debian.org/reproducible-builds/diffoscope/commit/9c7e817c79f19e67e56d564b55b728a54a35423b > > [3] > > https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140/diffs > > Actually, whilst the cause is also cpython#110016, this is [1], not #1068705. > This one is not a regression (which presumably means (1) and (2) do not apply) > as it only affects our MozillaZipContainer monkeypatch, which is not exactly > supported by upstream cpython as these are not valid ZIP files. So the > relevant > patch to backport is [2]. > > Semi-related, I'm wondering if the removal of marshal.load() [3] is worth > backporting at the same time, given the security implications. > > - Fay > > [1] https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > [2] > https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > [3] > https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/e75871b07e09cfd778181d905f540a15bd71e63a For completeness: had this been #1068705 (or in case someone feels it worth to fix that one too since it is the result of the same Python change even though it's not related to the FTBFS), the most relevant patch would have been [1] (though catching the exception would at least prevent a crash). - Fay [1] https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/945fd9faed3ae48d5f16602db1eb037c4785aeb5
Bug#1078883: diffoscope: FTBFS: failing tests
* Chris Lamb [2024-08-19 12:37]: > Santiago Vila wrote: > > > E zipfile.BadZipFile: Overlapped entries: 'dir/text' (possible zip bomb) > > (FAO to help folks joining the thread: this was from a rebuild of > stable, not of unstable. This bug does not affect unstable nor > testing.) > > So, this is essentially the same issue as #1068705, which we believe > was caused by a regression in CPython [0] which was, in turn, caused > by an attempt to make the handling of .zip file safer [1]. > > We worked around this in diffoscope by catching the exception [2]. > Then, we added a visible user note that we had done so [3]. > > I think we have four options: > > 1. Revert the security-related changes in CPython. > 2. Write and apply a patch to CPython to fix the CPython regression. > 3. Backport the two patches (or just the second [2]) to stable. > 4. Do nothing and accept that diffoscope FTBFS in stable. > > (1) is pretty much a no-go, and then I don't think a patch to (2) will > be forthcoming as I lack the confidence to safely write one. And (4) > only works if we think that someone will effect (2) for us, will be > backported by the CPython devs _and_ it will land in bookworm soon. A > tall order. > > (3) is thus probably the best plan. The first (or both) of the > linked changes [2][3] could straightforwardly and safely be backported > to stable… if folks that it is justified. Let me know. > > [0] https://github.com/python/cpython/issues/117779 > [1] https://github.com/python/cpython/pull/110016 > [2] > https://salsa.debian.org/reproducible-builds/diffoscope/commit/9c7e817c79f19e67e56d564b55b728a54a35423b > [3] > https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140/diffs Actually, whilst the cause is also cpython#110016, this is [1], not #1068705. This one is not a regression (which presumably means (1) and (2) do not apply) as it only affects our MozillaZipContainer monkeypatch, which is not exactly supported by upstream cpython as these are not valid ZIP files. So the relevant patch to backport is [2]. Semi-related, I'm wondering if the removal of marshal.load() [3] is worth backporting at the same time, given the security implications. - Fay [1] https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 [2] https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/cc3b077f6ef97b4e20036e9823926fe633c7d4d0 [3] https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/e75871b07e09cfd778181d905f540a15bd71e63a
Bug#1078944: diffoscope fails to build from source and fails to run in debian sid/unstable (#389)
* Holger Levsen [2024-08-18 09:12]: > On Sun, Aug 18, 2024 at 07:10:45AM +, Holger Levsen wrote: > > On Sat, Aug 17, 2024 at 09:08:27PM +0000, FC (Fay) Stegerman (@obfusk) > > wrote: > > > FC (Fay) Stegerman commented: > > > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/389#note_516514 > > > See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1078747 > > > > also pretty visible here: > > > > https://tests.reproducible-builds.org/debian/stats_breakages_100d.png > > filing this as a proper BTS bug to prevent migration to trixie. Am I missing something here? Because diffoscope 274 is already in trixie. And the issue isn't a result of a change to diffoscope itself but a result of python3-cryptography (imported by python3-pypdf, which diffoscope recommends) not working without openssl-provider-legacy (or a workaround). So presumably we will see the same issues in trixie when openssl migrates. Interestingly, python3-cryptography isn't actually a dependency of python3-pypdf (it recommends python3-pycryptodome instead). But it tries to load cryptography -- which something else must have pulled in -- first, and only catches ImportError, not this RuntimeError. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-15 12:13]: > I've got two remaining questions about libscout (and diffoscope) > > On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote: > > unzip does seem to extract all the files, though it errors out. Not sure > > what > > diffoscope should do here. This is definitely a broken ZIP file. That bug > > should probably be reported against libscout or whatever tooling it used to > > create that JAR. > > you filed https://github.com/python/cpython/issues/117779 > (thanks again!), am I correct to assume that thus there's no need > to file a seperate bug against libscout? It's generating a broken ZIP file with duplicate entries. It really shouldn't be doing that, regardless of whether we can extract the files nonetheless. That's still a bug that should be reported and fixed. > and 2nd, > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/libscout.html > now as expected displays: > > './usr/share/java/libscout.jar' has 35 duplicate entries > './usr/share/java/libscout.jar' has 35 duplicate entries > > (which is nice, though maybe could only been shown once?) Ah. It correctly shows that twice as there could be differences between the two files being compared wrt whether they have duplicate entries (and if so how many). And if you run 'diffoscope foo.zip bar.zip' it'll show those two different file names. But in this case we have nested archives and the path (and in this case also the number of duplicate entries) is identical for both, so maybe we can tweak the output to show which top-level file it belongs to? > but > https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/arm64/diffoscope-results/libscout.html > shows this: > > Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. > Standard output: [...] > though this later is done using diffoscope from unstable while the > rest of the userland is bullseye, so this might be expected as well? Ah. Looks like zipdetails(1) on bullseye doesn't support the --redact, --scan, and --utc options yet. - Fay
Bug#1068853: reprotest: SyntaxWarning: invalid escape sequence '\;'
* Vagrant Cascadian [2024-04-12 19:29]: > On 2024-04-12, Holger Levsen wrote: > > when installing reprotest 0.7.27: > > > > SyntaxWarning: invalid escape sequence '\;' > > Setting up reprotest (0.7.27) ... > > /usr/lib/python3/dist-packages/reprotest/__init__.py:360: SyntaxWarning: > > invalid escape sequence '\;' > > run_or_tee(['sh', '-ec', 'find %s -type f -exec sha256sum "{}" \;' % > > self.artifact_pattern], [...] > How exactly did you get this error? > > I installed locally, but did not encounter any such issues on package > installation just now, and also nothing when manually running a simple > test: > > reprotest 'date > date' date > WARNING:reprotest:The control build runs on 1 CPU by default, give --min-cpus > to increase this. > WARNING:reprotest.build:IGNORING user_group variation; supply more usergroups > with --variations=user_group.available+=USER1:GROUP1;USER2:GROUP2 or > alternatively, suppress this warning with --variations=-user_group > WARNING:reprotest.build:Not using sudo for domain_host; your build may fail. > See man page for other options. > WARNING:reprotest.build:Be sure to `echo 1 > > /proc/sys/kernel/unprivileged_userns_clone` if on a Debian system. > --- /tmp/tmp4vqq6736/control > +++ /tmp/tmp4vqq6736/experiment-1 > │ --- /tmp/tmp4vqq6736/control/source-root > ├── +++ /tmp/tmp4vqq6736/experiment-1/source-root > │ │ --- /tmp/tmp4vqq6736/control/source-root/date > │ ├── +++ /tmp/tmp4vqq6736/experiment-1/source-root/date > │ │ @@ -1 +1 @@ > │ │ +L 13 apr 2024 07:27:01 GMT > │ │ -Fri Apr 12 05:27:01 GMT 2024 That syntax warning is new in Python 3.12. And it's correct, one should use raw strings (r'...') or two backslashes for escape sequences intended for e.g. regexes or shell commands like here, not Python itself. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-11 12:54]: > On Thu, Apr 11, 2024 at 11:28:19AM +0100, Chris Lamb wrote: > [...] > > Applied in Git with attribution taken from your email. > [...] > > Fixed as well. And it adds a nice comment displaying the issue. > > awesome, thank you both! The promised cpython issue: https://github.com/python/cpython/issues/117779 And a small follow-up: https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140 - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Fay Stegerman [2024-04-11 04:28]: > * Holger Levsen [2024-04-11 02:14]: > > > unzip does seem to extract all the files, though it errors out. Not sure > > > what > > > diffoscope should do here. This is definitely a broken ZIP file. That > > > bug > > > should probably be reported against libscout or whatever tooling it used > > > to > > > create that JAR. > > > > I agree it's more complicated, but fundamentally, diffoscope should *not* > > crash > > here! (but rather report the broken zip file.) > > I think we all agree it shouldn't crash :) > > What I meant is that I'm not sure it should simply catch the error, report the > file as broken, and not attempt extraction, or if it makes sense to attempt to > work around this issue, at least in cases like this specific one where the > entries are exact duplicates and the files can presumably be safely extracted. > I think my workaround (which could be implemented slightly differently as > well, > without modifying the ZipFile, but processing it differently in diffoscope) > would accomplish that for this JAR at least. I could make an MR for that. > Though as I said I will also report this upstream to cpython, probably > tomorrow. > > - Fay The attached patch avoids the crash in this case, FWIW. I would still recommend catching the error for other cases. - Fay diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py index 2a27042a..4bfb1527 100644 --- a/diffoscope/comparators/zip.py +++ b/diffoscope/comparators/zip.py @@ -182,7 +182,12 @@ class ZipDirectory(Directory, ArchiveMember): class ZipContainer(Archive): def open_archive(self): -return zipfile.ZipFile(self.source.path, "r") +zf = zipfile.ZipFile(self.source.path, "r") +self.name_to_info = {} +for info in zf.infolist(): +if info.filename not in self.name_to_info: +self.name_to_info[info.filename] = info +return zf def close_archive(self): self.archive.close() @@ -199,7 +204,8 @@ class ZipContainer(Archive): ).encode(sys.getfilesystemencoding(), errors="replace") try: -with self.archive.open(member_name) as source, open( +info = self.name_to_info[member_name] +with self.archive.open(info) as source, open( targetpath, "wb" ) as target: shutil.copyfileobj(source, target)
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-11 02:14]: > > unzip does seem to extract all the files, though it errors out. Not sure > > what > > diffoscope should do here. This is definitely a broken ZIP file. That bug > > should probably be reported against libscout or whatever tooling it used to > > create that JAR. > > I agree it's more complicated, but fundamentally, diffoscope should *not* > crash > here! (but rather report the broken zip file.) I think we all agree it shouldn't crash :) What I meant is that I'm not sure it should simply catch the error, report the file as broken, and not attempt extraction, or if it makes sense to attempt to work around this issue, at least in cases like this specific one where the entries are exact duplicates and the files can presumably be safely extracted. I think my workaround (which could be implemented slightly differently as well, without modifying the ZipFile, but processing it differently in diffoscope) would accomplish that for this JAR at least. I could make an MR for that. Though as I said I will also report this upstream to cpython, probably tomorrow. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Fay Stegerman [2024-04-11 01:48]: > * Holger Levsen [2024-04-10 19:43]: > > On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote: > > > Holger Levsen wrote: > > > > > > > when building libscout 2.3.2-3 on current unstable, the result is also > > > > unreproducible, but diffoscope crashes when analysing the diff. > > > I think this is somewhat related to: > > > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > > > … which was said to be fixed by Fay in > > > cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > > > that released as diffoscope version 263 on 2024-04-05. > > > However, I can see that the current output of libscout/amd64 on > > > tests.reproducible-builds.org is failing with this very version: > > > > yes, indeed. > > > > also, this happened before too, I'm sure about at least with diffoscope 260 > > already. > > > > > Will loop Fay in via Salsa presently. > > > > thank you! > > Salsa is probably better for figuring out what to do next, but I get these > mails > too :) > > The libscout.jar has duplicate ZIP entries in the central directory, pointing > to > the same actual entry in the ZIP. So the "overlapped entries" error is > entirely > correct, even if it's not a zip bomb. > > >>> import zipfile > >>> zf = zipfile.ZipFile("libscout.jar") > >>> fh = zf.open("javax/annotation/CheckForNull.class") > zipfile.BadZipFile: Overlapped entries: > 'javax/annotation/CheckForNull.class' (possible zip bomb) [...] I do have a workaround of sorts for this specific case of duplicate entries. I'll open a cpython issue to report it to upstream. Though they may not consider this a bug, possibly even the correct behaviour. Not sure myself tbh :) >>> for info in reversed(zf.infolist()): ... zf.NameToInfo[info.filename] = info >>> fh = zf.open("javax/annotation/CheckForNull.class") # works now - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-10 19:43]: > On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote: > > Holger Levsen wrote: > > > > > when building libscout 2.3.2-3 on current unstable, the result is also > > > unreproducible, but diffoscope crashes when analysing the diff. > > I think this is somewhat related to: > > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > > … which was said to be fixed by Fay in > > cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > > that released as diffoscope version 263 on 2024-04-05. > > However, I can see that the current output of libscout/amd64 on > > tests.reproducible-builds.org is failing with this very version: > > yes, indeed. > > also, this happened before too, I'm sure about at least with diffoscope 260 > already. > > > Will loop Fay in via Salsa presently. > > thank you! Salsa is probably better for figuring out what to do next, but I get these mails too :) The libscout.jar has duplicate ZIP entries in the central directory, pointing to the same actual entry in the ZIP. So the "overlapped entries" error is entirely correct, even if it's not a zip bomb. >>> import zipfile >>> zf = zipfile.ZipFile("libscout.jar") >>> fh = zf.open("javax/annotation/CheckForNull.class") zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' (possible zip bomb) >>> len([i for i in zf.infolist() if i.filename == "javax/annotation/CheckForNull.class"]) 2 >>> len(zf.namelist()) - len(set(zf.namelist())) 35 >>> x, y = [i for i in zf.infolist() if i.filename == "javax/annotation/CheckForNull.class"] >>> x.header_offset 23065534 >>> y.header_offset 23065534 >>> x._end_offset 23065890 >>> y._end_offset 23065534 >>> zf.open(x) >>> zf.open(y) Traceback (most recent call last): zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' (possible zip bomb) $ unzip -q -d foo libscout.jar error: invalid zip file with overlapped components (possible zip bomb) unzip does seem to extract all the files, though it errors out. Not sure what diffoscope should do here. This is definitely a broken ZIP file. That bug should probably be reported against libscout or whatever tooling it used to create that JAR. FWIW, it seems the libscout.jar files in both .deb files are identical apart from timestamps and the ordering of entries in the ZIP. - Fay