Reproducible Builds Status Summary for Guix

2022-06-12 Thread Vagrant Cascadian
I've been working on Reproducible Builds in guix a fair amount this
month.

data.guix.gnu.org has proven invaluable for this work, big thanks for
that!

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility


I have cataloged many of the packages that are identified by
dowloading a .json file:

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-derivation-outputs.json?output_consistency=not-matching=x86_64-linux=none=no-additional-fields_results=1'

And then running those packages in a guix challenge for loop...

  for a in $@ ; do
diffoscope_out=${a}.diffoscope
diffoscope_out_comp=${diffoscope_out}.zst
package=${a}
if [ -s "${diffoscope_out_comp}" ] ; then
echo ${diffoscope_out_comp} already present, skipping...
else
guix challenge --verbose --diff=diffoscope ${a} 2>&1 | tee 
"${diffoscope_out}"
test -s "${diffoscope_out}" && zstd --rm --threads=0 "${diffoscope_out}"
fi
  done

A few times I ran into disk space issues, due to:

  guix challenge with diffoscope fails to clean up temporary directory
  https://issues.guix.gnu.org/55809

So had to manually clean up some files and re-run it a few times and
probably missed a few packages...


I've looked at each of these diffoscope outputs and tried to quickly
categorize them. Attached a .yaml file (we cannot possibly have enough
different file formats!) that includes a rough identifier for each
issue. It was a rough and quick best-effort pass through, so there may
be some discrepancies...


I've already pushed fixes for a handful of packages, and tried to
remember to mark them as fixed. I've probably left many of the fixed
ones out of this list, but not terribly worried about that.

Some rough summaries about the types of issues:

  * ecl-* packages account for nearly half of the issues (~500 out of
~1000 packages)

  * ~850 packages categorized (ecl-* accounting for most of them)

  * 19 packages embed kernel version

  * 63 packages embed timestamps

  * 52 packages embed dates (harder to reproduce that full timestamps)

  * 5 timestamps in python .pyc files

  * 12 timestamps in .jar files

  * 66 ordering issues

  * 3 ordering issues in .pyc files

  * 9 ordering in .jar files

  * 16 ordering in guile .go files

  * ~160 largely unidentified and inscrutible issues

That's unfortunately a lot of "unidentified" issues, but I figured I'd
at least mark the ones I looked at.

This does reveal that there are some opportunities for toolchain fixes,
fixing multiple packages at a time (and future packages too!), such as
ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE
proposal
https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ).

I haven't done extensive cross-referencing with other distros, but
suspect there may be patches to fix some of these toolchain issues... If
you've savvy with any of the above languages, help fixing toolchain
issues would be amazing!


I'm not sure where to collaborate on this stuff, I've just got a local
git repository and it's a bit rough. I could also push a branch to
guix.git with something like this in it.

There is a rough proposal for using a multi-project "notes" format that
debian uses:

  https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master
  
https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros

... back in 2016, and touched on at later Reproducible Builds summits,
but not really adopted as far as I know. But I know some of the issues
are essentially the same across distros; yet some are surprisingly
different even with the same source code!


If you're looking to get your hands dirty with some reproducibility
fixes in guix, a fair number of the timestamp, date and kernel version
fixes are likely fairly easy, but you generally have to manually verify
that the date or kernel version aren't embedded, as "guix build
--rounds=2" will likely happen with the same kernel version and date.


Will be curious to see any new and exciting issues after the staging
merge!


live well,
  vagrant


guix-rb-notes.yml
Description: Binary data


signature.asc
Description: PGP signature


Re: Merging ‘staging’?

2022-06-12 Thread Ludovic Courtès
Hi,

Efraim Flashner  skribis:

> Let's do it

S… turns out commit e31ab8c24848a7691a838af8df61d3e7097cddbc on
‘master’ unwillingly triggered a rebuild of 2K packages.

It’s too late to revert (they’ve been built anyway), but I’ve merged
‘master’ in ‘staging’ so they can be built there.

Which means I’ll merge ‘staging’ in ‘master’ tomorrow morning if nothing
has collapsed by then.  :-)

Ludo’.



Re: Merging ‘staging’?

2022-06-12 Thread Ludovic Courtès
Hi,

Thiago Jung Bauermann  skribis:

> Sorry for the delay. I've built some packages from the staging branch on
> powerpc64le-linux (including Emacs, which brings in a lot of stuff) and
> it seems good.

That’s good news, thanks for testing!

Ludo’.



Re: Missing key for andreas on Savannah

2022-06-12 Thread Andreas Enge
Hello,

Am Sun, Jul 25, 2021 at 12:35:13AM +0200 schrieb Tobias Geerinckx-Rice:

this ended up in the Guix folder at a time I was not reading it, so I am
discovering it only now...

> Your GPG key used to sign Guix commits is missing from your Savannah
> profile[0] and hence its Guix keyring[1].

Ah, I was not aware that this even existed; I thought everything is
handled in the keyring branch of Guix? Thanks for letting me know, I just
added my GPG key there.

> (I'm ignoring the 'enge' ghost(?) account :-)

This is also me, but another me ;-)

Andreas




Re: On commit access, patch review, and remaining healthy

2022-06-12 Thread Maxime Devos
Giovanni Biscuolo schreef op zo 12-06-2022 om 11:42 [+0200]:
> > or have packages with bundled dependencies (e.g. vendored jars).
> 
> bundling binaries it's (is it?) for sure against the definition of a
> reproducible build, but what about bundling (source) dependencies?
> 
> AFAIU not to bundle (source) dependencies is an additional Guix
> requirement (and it is a Good Thing™): do I miss something?

FWIW, sometimes the bundled ‘source’ dependencies contain bundled
binaries of their own.  So while AFAICT not strictly necessary for
reproducible builds, unbundling ‘source dependencies’ makes ensuring
reproducibility(*) much more convenient.

(*) i.e., the non-trivial kind of reproducibility, where things are
actually built from source instead of copying binaries.

> honestly I did not study all the reproducible-builds.org
> documentation,
> but it's impossible to me to understand how a packaged upstream jar
> can be considered reproducible (and bootstrappable); maybe distros
> like NixOS are still slowly transitioning to a full reproducible
> build workflow?

It's ‘reproducible’ in the trivial sense that you can ‘reproduce’ a
scientific paper by putting it a photocopier.  That way, you can
reproduce the results, but you cannot confirm whether these results
were correct.

Greetings,
Maxime.



signature.asc
Description: This is a digitally signed message part


Re: Test US mirror for bordeaux.guix.gnu.org and slow downloading of substitutes

2022-06-12 Thread Christopher Baines

Christopher Baines  writes:

> So, one thing that I'd be interested in, is hearing from anyone who
> thinks they get worse download performance from bordeaux.guix.gnu.org or
> ci.guix.gnu.org than they get when downloading other
> things. Importantly, it would be good to know roughly where
> (geographically) the machine doing the downloading is, and some data to
> show the difference.
>
> For example, I'm in the United Kingdom in Europe, and this is the output
> from wget downloading a ~200M file from bordeaux.guix.gnu.org:
>
>   → wget 
> https://bordeaux.guix.gnu.org/nar/lzip/078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0
>   --2022-05-20 16:49:56--  
> https://bordeaux.guix.gnu.org/nar/lzip/078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0
>   Resolving bordeaux.guix.gnu.org (bordeaux.guix.gnu.org)... 2a0c:e300::58, 
> 185.233.100.56
>   Connecting to bordeaux.guix.gnu.org 
> (bordeaux.guix.gnu.org)|2a0c:e300::58|:443... connected.
>   HTTP request sent, awaiting response... 200 OK
>   Length: 208615205 (199M) [text/plain]
>   Saving to: ‘078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0.6’
>
>   078vr3r8mn3yrwzwxw64 100%[==>] 198.95M  4.24MB/sin 
> 46s
>
>   2022-05-20 16:50:43 (4.31 MB/s) - 
> ‘078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0.6’ saved 
> [208615205/208615205]
>
>
> Also, I've setup a US based (Hetzner data center, east coast) mirror of
> bordeaux.guix.gnu.org:
>
>   https://bordeaux-us-east-mirror.cbaines.net/
>
> So, I'd also be interested in seeing how that performs for people, and
> how it compares against bordeaux.guix.gnu.org, which is hosted in France
> in Europe.
>
> Here's my output from wget:
>
>   → wget 
> https://bordeaux-us-east-mirror.cbaines.net/nar/lzip/078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0
>   --2022-05-20 16:50:44--  
> https://bordeaux-us-east-mirror.cbaines.net/nar/lzip/078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0
>   Resolving bordeaux-us-east-mirror.cbaines.net 
> (bordeaux-us-east-mirror.cbaines.net)... 5.161.49.48
>   Connecting to bordeaux-us-east-mirror.cbaines.net 
> (bordeaux-us-east-mirror.cbaines.net)|5.161.49.48|:443... connected.
>   HTTP request sent, awaiting response... 200 OK
>   Length: 208615205 (199M) [text/plain]
>   Saving to: ‘078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0.7’
>
>   078vr3r8mn3yrwzwxw64 100%[==>] 198.95M  4.17MB/sin 
> 47s
>
>   2022-05-20 16:51:32 (4.22 MB/s) - 
> ‘078vr3r8mn3yrwzwxw64hmcyshic9p3q-stellarium-0.21.0.7’ saved 
> [208615205/208615205]

Hey,

I'm still interested in hearing about how the mirrors perform for
different people.

I've now setup a test mirror in Singapore, in addition to the one in the
US. Plus, there's bishan, which is hosted in Germany.

So, here are the 4 domains which serve bordeaux.guix.gnu.org nars by
geographical location:

  France: bordeaux.guix.gnu.org
  Germany:bishan.guix.gnu.org
  US: bordeaux-us-east-mirror.cbaines.net
  Singapore:  bordeaux-singapore-mirror.cbaines.net

Chris


signature.asc
Description: PGP signature


Re: On commit access, patch review, and remaining healthy

2022-06-12 Thread Giovanni Biscuolo
Hi Ricardo and all,

following this discussion, it came to my mind a great presentation made
by Prot:

https://protesilaos.com/codelog/2021-12-21-emacsconf2021-freedom/
«How Emacs made me appreciate software freedom»

especially the "You can't be an Emacs tourist" part; I think that
similar arguments can be adapted to a "(Guix?) Software developer can't
be a repro+bootstrapping tourist" (to fully unserstand my analogy please
read or listen to Prot presentation)

concerning this discussion, this is probably the most interesting part:

--8<---cut here---start->8---

Now you may wonder why do I mention those things?  Shouldn't we make
Emacs easier for everyone?  Yes, we should make everything as simple as
possible.  Though that still does not refashion Emacs into something
entirely different.  We continue to have a potent tool at our disposal
that we must treat with the requisite respect.  Take, for instance, the
various frameworks that set up Emacs in an opinionated way so that
newcomers get everything set up for them out-of-the-box.  There is
nothing wrong with those frameworks.  In fact, a large part of the
community uses them to great effect.  However, the point stands: even
after every package has been set up for you, you still have to put in
the work in making use of your newfound computing freedom.

--8<---cut here---end--->8---

Ricardo Wurmus  writes:

[...]

>>> - We build strictly from source.
>>
>> This is also a requirement now adopted by many other distributions, at
>> least all the ones in https://reproducible-builds.org/who/projects/
>
> NixOS is on the list, but they don’t have this requirement.  That’s why
> they have Java packages that are little more than the upstream jars,

good point Ricardo, the very moment I started replying I had it in my
mind but forgot to write it

I guess that all experienced packagers or maintainers well understands
what's needed in order to get a reproducible AND bootstrappable package:
almost all of the "constraints" Guix "impose" to packagers and
contributors depends from this... let's call them "golden rules of
software security"?

I just feel sometimes it's hard for newcomers to understand this,
especially considering that unfortunately both some projects in that
list (https://reproducible-builds.org/who/projects/) and some (some?)
upstream developers do not care much about them

the "tag line" of https://reproducible-builds.org/ is

--8<---cut here---start->8---

Reproducible builds are a set of software development practices that
create an independently-verifiable path from source to binary code.

--8<---cut here---end--->8---

honestly I did not study all the reproducible-builds.org documentation,
but it's impossible to me to understand how a packaged upstream jar can
be considered reproducible (and bootstrappable); maybe distros like
NixOS are still slowly transitioning to a full reproducible build
workflow?

IMHO the simple fact that (some, one?) projects listed on
reproducible-builds.org are still bundling binaries in their packages
it's too confusing for newcomers

> or have packages with bundled dependencies (e.g. vendored jars).

bundling binaries it's (is it?) for sure against the definition of a
reproducible build, but what about bundling (source) dependencies?

AFAIU not to bundle (source) dependencies is an additional Guix
requirement (and it is a Good Thing™): do I miss something?

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature


Re: On commit access, patch review, and remaining healthy

2022-06-12 Thread Ricardo Wurmus


>> - We have strict conventions for commit messages. Our commit message
>>   Changelog is a strange dated practice from the time before good
>>   version control systems. I can live with it, but not everyone likes
>>   it. Let's just say I've heard complaints about it offlist.
>
> AFAIU this is a requirement Guix inherits from GNU (being it a GNU
> project)
>
> I don't remember the ratio for this requirement but AFAIU it made sense
> to me when I read that.
>
> I just hope this requirement is refraining people to contribute and to
> review patches.
>
> Maybe we could help users not using Emacs with other editor-related
> snippets in [~/src/guix/]etc/snippets? (I don't know other editors
> templating systems)

We have an editor-agnostic tool in ./etc/committer.scm.

-- 
Ricardo



Re: On commit access, patch review, and remaining healthy

2022-06-12 Thread Ricardo Wurmus


> Date: Fri, 10 Jun 2022 14:27:44 +0200
> From: Giovanni Biscuolo 
> To: Arun Isaac , Guix Devel
>   
> Cc: GNU Guix maintainers 
> Subject: Re: On commit access, patch review, and remaining healthy
> Message-ID: <87o7z0itz3@xelera.eu>
> Content-Type: text/plain; charset="utf-8"
>
>> - We build strictly from source.
>
> This is also a requirement now adopted by many other distributions, at
> least all the ones in https://reproducible-builds.org/who/projects/

NixOS is on the list, but they don’t have this requirement.  That’s why
they have Java packages that are little more than the upstream jars, or
have packages with bundled dependencies (e.g. vendored jars).

-- 
Ricardo