Re: Reproducible Builds Status Summary for Guix
On 2022-06-12, Vagrant Cascadian wrote: > I've been working on Reproducible Builds in guix a fair amount this > month. I did another round of this... I fixed a few packages recently, and noticed some other people fixing packages too, yay! As of this moment for x86_64, it looks like: * ~83% matching (a.k.a. reproducible) for 18920 packages * ~6% not matching (a.k.a. NOT reproducible) for 1337 packages * ~11% unknown (e.g. not built on both build farms) for 2440 packages https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility Ignoring the pesky unknown packages, it is more like ~93% reproducible and ~7% unreproducible... that feels a bit better to me! These numbers wander around over time, mostly due to packages moving back into an "unknown" state while the build farms catch up with each other... although the above numbers seem to have been pretty consistent over the last few days. > Some rough summaries about the types of issues: > > * ecl-* packages account for nearly half of the issues (~500 out of > ~1000 packages) More like ~570 out of ~1300 this time. Apparently there is an upstream issue for ecl, which is referenced in the summary. > * ~850 packages categorized (ecl-* accounting for most of them) ~990 packages reviewed (many duplicates from previous run). Slightly higher number to review higher this time, mostly due to some previous unknowns being reproducible/not reproducible. There are a handful of older-versions of things (e.g. package@1.0 vs package@2.0) that fail to build reproducibly and I didn't bother to look, I only checked the most recent versions of packages, so there are probably 300+ packages that could be reviewed. > * 19 packages embed kernel version 22 kernel version > * 63 packages embed timestamps 92 timestamps > * 52 packages embed dates (harder to reproduce that full timestamps) 46 dates > * 5 timestamps in python .pyc files 7 .pyc timestamps > * 12 timestamps in .jar files 12 .jar timestamps > * 66 ordering issues 82 ordering > * 3 ordering issues in .pyc files 3 .pyc ordering > * 9 ordering in .jar files 10 .jar ordering > * 16 ordering in guile .go files 13 guile .go ordering > * ~160 largely unidentified and inscrutible issues 193 unidentified > This does reveal that there are some opportunities for toolchain fixes, > fixing multiple packages at a time (and future packages too!), such as > ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE > proposal > https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ). Still true! I tried patching texlive directly and failed to come up with something that worked, but haven't tried again recently. > I haven't done extensive cross-referencing with other distros, but > suspect there may be patches to fix some of these toolchain issues... If > you've savvy with any of the above languages, help fixing toolchain > issues would be amazing! Did a little of this, but still more to do! > If you're looking to get your hands dirty with some reproducibility > fixes in guix, a fair number of the timestamp, date and kernel version > fixes are likely fairly easy, but you generally have to manually verify > that the date or kernel version aren't embedded, as "guix build > --rounds=2" will likely happen with the same kernel version and date. Still very true! Maybe I should arrange a little virtual hackfest or something... I should probably normalize these issues a bit more and simplify them, but the full list I looked should be attached. Would it be ok to maintain this and some of the relevent tooling in a branch in guix.git, say, "reproducibility-notes"? Or make a new repository just for this? It most likely wouldn't share history with the other branches (much like the "keyring" branch), but presumably won't grow too large either. live well, vagrant guix-rb-notes.yml Description: Binary data signature.asc Description: PGP signature
Re: Reproducible Builds Status Summary for Guix
Guillaume Le Vaillant skribis: > Ludovic Courtès skribis: > >> Hi! >> >> Vagrant Cascadian skribis: >> >>> Some rough summaries about the types of issues: >>> >>> * ecl-* packages account for nearly half of the issues (~500 out of >>> ~1000 packages) >> >> This seems to be a problem with generated identifiers at first sight; >> would be worth taking upstream. Any Common Lisper here? :-) >> > > Hi, > There's an open issue about this upstream [1]. > > [1] https://gitlab.com/embeddable-common-lisp/ecl/-/issues/551 Nice, kudos for tracking it down and coming up with a fix! Ludo’.
Re: Reproducible Builds Status Summary for Guix
Ludovic Courtès skribis: > Hi! > > Vagrant Cascadian skribis: > >> Some rough summaries about the types of issues: >> >> * ecl-* packages account for nearly half of the issues (~500 out of >> ~1000 packages) > > This seems to be a problem with generated identifiers at first sight; > would be worth taking upstream. Any Common Lisper here? :-) > Hi, There's an open issue about this upstream [1]. [1] https://gitlab.com/embeddable-common-lisp/ecl/-/issues/551 signature.asc Description: PGP signature
Re: Reproducible Builds Status Summary for Guix
Hi! Vagrant Cascadian skribis: > I've been working on Reproducible Builds in guix a fair amount this > month. > > data.guix.gnu.org has proven invaluable for this work, big thanks for > that! > > > https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility Neat! > A few times I ran into disk space issues, due to: > > guix challenge with diffoscope fails to clean up temporary directory > https://issues.guix.gnu.org/55809 Should be fixed now. :-) > Some rough summaries about the types of issues: > > * ecl-* packages account for nearly half of the issues (~500 out of > ~1000 packages) This seems to be a problem with generated identifiers at first sight; would be worth taking upstream. Any Common Lisper here? :-) > * ~850 packages categorized (ecl-* accounting for most of them) > > * 19 packages embed kernel version > > * 63 packages embed timestamps > > * 52 packages embed dates (harder to reproduce that full timestamps) > > * 5 timestamps in python .pyc files > > * 12 timestamps in .jar files > > * 66 ordering issues > > * 3 ordering issues in .pyc files > > * 9 ordering in .jar files > > * 16 ordering in guile .go files > > * ~160 largely unidentified and inscrutible issues > > That's unfortunately a lot of "unidentified" issues, but I figured I'd > at least mark the ones I looked at. Yes, that’s already an insightful breakdown. > There is a rough proposal for using a multi-project "notes" format that > debian uses: > > > https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master > > https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros > > ... back in 2016, and touched on at later Reproducible Builds summits, > but not really adopted as far as I know. But I know some of the issues > are essentially the same across distros; yet some are surprisingly > different even with the same source code! I was very optimistic about using this database cross-distro back at the first R-B Summit! I still look at it occasionally when an issue pops up, but it’s not become the collaborative platform we were hoping for. It’s never too late though! (Debian is in sense stricter in that some things can be an issue there (like store build file names) and not here, because the Guix build environment is controlled and “canonicalized”. So not all the issues in there are relevant to us I guess.) Thanks for the update! Ludo’.
Reproducible Builds Status Summary for Guix
I've been working on Reproducible Builds in guix a fair amount this month. data.guix.gnu.org has proven invaluable for this work, big thanks for that! https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility I have cataloged many of the packages that are identified by dowloading a .json file: https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-derivation-outputs.json?output_consistency=not-matching&system=x86_64-linux&target=none&field=no-additional-fields&limit_results=1' And then running those packages in a guix challenge for loop... for a in $@ ; do diffoscope_out=${a}.diffoscope diffoscope_out_comp=${diffoscope_out}.zst package=${a} if [ -s "${diffoscope_out_comp}" ] ; then echo ${diffoscope_out_comp} already present, skipping... else guix challenge --verbose --diff=diffoscope ${a} 2>&1 | tee "${diffoscope_out}" test -s "${diffoscope_out}" && zstd --rm --threads=0 "${diffoscope_out}" fi done A few times I ran into disk space issues, due to: guix challenge with diffoscope fails to clean up temporary directory https://issues.guix.gnu.org/55809 So had to manually clean up some files and re-run it a few times and probably missed a few packages... I've looked at each of these diffoscope outputs and tried to quickly categorize them. Attached a .yaml file (we cannot possibly have enough different file formats!) that includes a rough identifier for each issue. It was a rough and quick best-effort pass through, so there may be some discrepancies... I've already pushed fixes for a handful of packages, and tried to remember to mark them as fixed. I've probably left many of the fixed ones out of this list, but not terribly worried about that. Some rough summaries about the types of issues: * ecl-* packages account for nearly half of the issues (~500 out of ~1000 packages) * ~850 packages categorized (ecl-* accounting for most of them) * 19 packages embed kernel version * 63 packages embed timestamps * 52 packages embed dates (harder to reproduce that full timestamps) * 5 timestamps in python .pyc files * 12 timestamps in .jar files * 66 ordering issues * 3 ordering issues in .pyc files * 9 ordering in .jar files * 16 ordering in guile .go files * ~160 largely unidentified and inscrutible issues That's unfortunately a lot of "unidentified" issues, but I figured I'd at least mark the ones I looked at. This does reveal that there are some opportunities for toolchain fixes, fixing multiple packages at a time (and future packages too!), such as ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE proposal https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ). I haven't done extensive cross-referencing with other distros, but suspect there may be patches to fix some of these toolchain issues... If you've savvy with any of the above languages, help fixing toolchain issues would be amazing! I'm not sure where to collaborate on this stuff, I've just got a local git repository and it's a bit rough. I could also push a branch to guix.git with something like this in it. There is a rough proposal for using a multi-project "notes" format that debian uses: https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros ... back in 2016, and touched on at later Reproducible Builds summits, but not really adopted as far as I know. But I know some of the issues are essentially the same across distros; yet some are surprisingly different even with the same source code! If you're looking to get your hands dirty with some reproducibility fixes in guix, a fair number of the timestamp, date and kernel version fixes are likely fairly easy, but you generally have to manually verify that the date or kernel version aren't embedded, as "guix build --rounds=2" will likely happen with the same kernel version and date. Will be curious to see any new and exciting issues after the staging merge! live well, vagrant guix-rb-notes.yml Description: Binary data signature.asc Description: PGP signature