Re: Reproducible Builds Status Summary for Guix

2022-08-21 Thread Vagrant Cascadian
On 2022-06-12, Vagrant Cascadian wrote:
> I've been working on Reproducible Builds in guix a fair amount this
> month.

I did another round of this...

I fixed a few packages recently, and noticed some other people fixing
packages too, yay!

As of this moment for x86_64, it looks like:

* ~83% matching (a.k.a. reproducible) for 18920 packages
* ~6% not matching (a.k.a. NOT reproducible) for 1337 packages
* ~11% unknown (e.g. not built on both build farms) for 2440 packages

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility

Ignoring the pesky unknown packages, it is more like ~93% reproducible
and ~7% unreproducible... that feels a bit better to me!

These numbers wander around over time, mostly due to packages moving
back into an "unknown" state while the build farms catch up with each
other... although the above numbers seem to have been pretty consistent
over the last few days.


> Some rough summaries about the types of issues:
>
>   * ecl-* packages account for nearly half of the issues (~500 out of
> ~1000 packages)

More like ~570 out of ~1300 this time.

Apparently there is an upstream issue for ecl, which is referenced in
the summary.

>   * ~850 packages categorized (ecl-* accounting for most of them)

~990 packages reviewed (many duplicates from previous run). Slightly
higher number to review higher this time, mostly due to some previous
unknowns being reproducible/not reproducible.

There are a handful of older-versions of things (e.g. package@1.0 vs
package@2.0) that fail to build reproducibly and I didn't bother to
look, I only checked the most recent versions of packages, so there are
probably 300+ packages that could be reviewed.

>   * 19 packages embed kernel version

22 kernel version

>   * 63 packages embed timestamps

92 timestamps

>   * 52 packages embed dates (harder to reproduce that full timestamps)

46 dates

>   * 5 timestamps in python .pyc files

7 .pyc timestamps

>   * 12 timestamps in .jar files

12 .jar timestamps

>   * 66 ordering issues

82 ordering

>   * 3 ordering issues in .pyc files

3 .pyc ordering

>   * 9 ordering in .jar files

10 .jar ordering

>   * 16 ordering in guile .go files

13 guile .go ordering

>   * ~160 largely unidentified and inscrutible issues

193 unidentified


> This does reveal that there are some opportunities for toolchain fixes,
> fixing multiple packages at a time (and future packages too!), such as
> ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE
> proposal
> https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ).

Still true!

I tried patching texlive directly and failed to come up with something
that worked, but haven't tried again recently.


> I haven't done extensive cross-referencing with other distros, but
> suspect there may be patches to fix some of these toolchain issues... If
> you've savvy with any of the above languages, help fixing toolchain
> issues would be amazing!

Did a little of this, but still more to do!


> If you're looking to get your hands dirty with some reproducibility
> fixes in guix, a fair number of the timestamp, date and kernel version
> fixes are likely fairly easy, but you generally have to manually verify
> that the date or kernel version aren't embedded, as "guix build
> --rounds=2" will likely happen with the same kernel version and date.

Still very true! Maybe I should arrange a little virtual hackfest or
something...

I should probably normalize these issues a bit more and simplify them,
but the full list I looked should be attached.

Would it be ok to maintain this and some of the relevent tooling in a
branch in guix.git, say, "reproducibility-notes"? Or make a new
repository just for this? It most likely wouldn't share history with the
other branches (much like the "keyring" branch), but presumably won't
grow too large either.


live well,
  vagrant


guix-rb-notes.yml
Description: Binary data


signature.asc
Description: PGP signature


Re: Reproducible Builds Status Summary for Guix

2022-06-17 Thread Ludovic Courtès
Guillaume Le Vaillant  skribis:

> Ludovic Courtès  skribis:
>
>> Hi!
>>
>> Vagrant Cascadian  skribis:
>>
>>> Some rough summaries about the types of issues:
>>>
>>>   * ecl-* packages account for nearly half of the issues (~500 out of
>>> ~1000 packages)
>>
>> This seems to be a problem with generated identifiers at first sight;
>> would be worth taking upstream.  Any Common Lisper here?  :-)
>>
>
> Hi,
> There's an open issue about this upstream [1].
>
> [1] https://gitlab.com/embeddable-common-lisp/ecl/-/issues/551

Nice, kudos for tracking it down and coming up with a fix!

Ludo’.



Re: Reproducible Builds Status Summary for Guix

2022-06-15 Thread Guillaume Le Vaillant
Ludovic Courtès  skribis:

> Hi!
>
> Vagrant Cascadian  skribis:
>
>> Some rough summaries about the types of issues:
>>
>>   * ecl-* packages account for nearly half of the issues (~500 out of
>> ~1000 packages)
>
> This seems to be a problem with generated identifiers at first sight;
> would be worth taking upstream.  Any Common Lisper here?  :-)
>

Hi,
There's an open issue about this upstream [1].

[1] https://gitlab.com/embeddable-common-lisp/ecl/-/issues/551


signature.asc
Description: PGP signature


Re: Reproducible Builds Status Summary for Guix

2022-06-15 Thread Ludovic Courtès
Hi!

Vagrant Cascadian  skribis:

> I've been working on Reproducible Builds in guix a fair amount this
> month.
>
> data.guix.gnu.org has proven invaluable for this work, big thanks for
> that!
>
>   
> https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility

Neat!

> A few times I ran into disk space issues, due to:
>
>   guix challenge with diffoscope fails to clean up temporary directory
>   https://issues.guix.gnu.org/55809

Should be fixed now.  :-)

> Some rough summaries about the types of issues:
>
>   * ecl-* packages account for nearly half of the issues (~500 out of
> ~1000 packages)

This seems to be a problem with generated identifiers at first sight;
would be worth taking upstream.  Any Common Lisper here?  :-)

>   * ~850 packages categorized (ecl-* accounting for most of them)
>
>   * 19 packages embed kernel version
>
>   * 63 packages embed timestamps
>
>   * 52 packages embed dates (harder to reproduce that full timestamps)
>
>   * 5 timestamps in python .pyc files
>
>   * 12 timestamps in .jar files
>
>   * 66 ordering issues
>
>   * 3 ordering issues in .pyc files
>
>   * 9 ordering in .jar files
>
>   * 16 ordering in guile .go files
>
>   * ~160 largely unidentified and inscrutible issues
>
> That's unfortunately a lot of "unidentified" issues, but I figured I'd
> at least mark the ones I looked at.

Yes, that’s already an insightful breakdown.

> There is a rough proposal for using a multi-project "notes" format that
> debian uses:
>
>   
> https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master
>   
> https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros
>
> ... back in 2016, and touched on at later Reproducible Builds summits,
> but not really adopted as far as I know. But I know some of the issues
> are essentially the same across distros; yet some are surprisingly
> different even with the same source code!

I was very optimistic about using this database cross-distro back at the
first R-B Summit!  I still look at it occasionally when an issue pops
up, but it’s not become the collaborative platform we were hoping for.
It’s never too late though!

(Debian is in sense stricter in that some things can be an issue there
(like store build file names) and not here, because the Guix build
environment is controlled and “canonicalized”.  So not all the issues in
there are relevant to us I guess.)

Thanks for the update!

Ludo’.



Reproducible Builds Status Summary for Guix

2022-06-12 Thread Vagrant Cascadian
I've been working on Reproducible Builds in guix a fair amount this
month.

data.guix.gnu.org has proven invaluable for this work, big thanks for
that!

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility


I have cataloged many of the packages that are identified by
dowloading a .json file:

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-derivation-outputs.json?output_consistency=not-matching&system=x86_64-linux&target=none&field=no-additional-fields&limit_results=1'

And then running those packages in a guix challenge for loop...

  for a in $@ ; do
diffoscope_out=${a}.diffoscope
diffoscope_out_comp=${diffoscope_out}.zst
package=${a}
if [ -s "${diffoscope_out_comp}" ] ; then
echo ${diffoscope_out_comp} already present, skipping...
else
guix challenge --verbose --diff=diffoscope ${a} 2>&1 | tee 
"${diffoscope_out}"
test -s "${diffoscope_out}" && zstd --rm --threads=0 "${diffoscope_out}"
fi
  done

A few times I ran into disk space issues, due to:

  guix challenge with diffoscope fails to clean up temporary directory
  https://issues.guix.gnu.org/55809

So had to manually clean up some files and re-run it a few times and
probably missed a few packages...


I've looked at each of these diffoscope outputs and tried to quickly
categorize them. Attached a .yaml file (we cannot possibly have enough
different file formats!) that includes a rough identifier for each
issue. It was a rough and quick best-effort pass through, so there may
be some discrepancies...


I've already pushed fixes for a handful of packages, and tried to
remember to mark them as fixed. I've probably left many of the fixed
ones out of this list, but not terribly worried about that.

Some rough summaries about the types of issues:

  * ecl-* packages account for nearly half of the issues (~500 out of
~1000 packages)

  * ~850 packages categorized (ecl-* accounting for most of them)

  * 19 packages embed kernel version

  * 63 packages embed timestamps

  * 52 packages embed dates (harder to reproduce that full timestamps)

  * 5 timestamps in python .pyc files

  * 12 timestamps in .jar files

  * 66 ordering issues

  * 3 ordering issues in .pyc files

  * 9 ordering in .jar files

  * 16 ordering in guile .go files

  * ~160 largely unidentified and inscrutible issues

That's unfortunately a lot of "unidentified" issues, but I figured I'd
at least mark the ones I looked at.

This does reveal that there are some opportunities for toolchain fixes,
fixing multiple packages at a time (and future packages too!), such as
ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE
proposal
https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ).

I haven't done extensive cross-referencing with other distros, but
suspect there may be patches to fix some of these toolchain issues... If
you've savvy with any of the above languages, help fixing toolchain
issues would be amazing!


I'm not sure where to collaborate on this stuff, I've just got a local
git repository and it's a bit rough. I could also push a branch to
guix.git with something like this in it.

There is a rough proposal for using a multi-project "notes" format that
debian uses:

  https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master
  
https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros

... back in 2016, and touched on at later Reproducible Builds summits,
but not really adopted as far as I know. But I know some of the issues
are essentially the same across distros; yet some are surprisingly
different even with the same source code!


If you're looking to get your hands dirty with some reproducibility
fixes in guix, a fair number of the timestamp, date and kernel version
fixes are likely fairly easy, but you generally have to manually verify
that the date or kernel version aren't embedded, as "guix build
--rounds=2" will likely happen with the same kernel version and date.


Will be curious to see any new and exciting issues after the staging
merge!


live well,
  vagrant


guix-rb-notes.yml
Description: Binary data


signature.asc
Description: PGP signature