On 3/29/24 6:48 AM, John Gilmore wrote:
John Gilmore <g...@toad.com> wrote:
Bootstrappable builds are a different thing.  Worthwhile, but not
what I was asking for.  I just wanted provable reproducibility from two
ISO images and nothing more.

I was asking that a bare amd64 be able to boot from an Arch Linux
*binary* ISO image.  And then be fed a matching Arch Linux *source* ISO
image.  And that the scripts in the source image would be able to
reproduce the binary image from its source code, running the binaries
(like the kernel, shell, and compiler) from the binary ISO image to do
the rebuilds (without Internet access).

This should be much simpler than doing a bootstrap from bare metal
*without* a binary ISO image.

I think this project would still be somewhat involved:

1) There's currently no way to tell if a package can be built offline (without trying yourself). Some distros have `options=(!net)`-like settings, but pacman currently doesn't. Needing network access for things like `cargo fetch` or `go mod download` is considered acceptable in Arch Linux, since these extra inputs are pinned by cryptographic hash (the PKGBUILD acts as a merkle-tree root).

Specifically Gentoo and OpenBSD Ports have solutions for this that I really like, they store a generated list of URLs along with a cryptographic checksum in a separate file, which includes crates referenced in e.g. a project's Cargo.lock. When unpacking them to the right location the build itself does not need any additional network resources and can run fully offline.

This concept currently does not exist in pacman, one would potentially need to generate 100+ lines into the source= array of a PKGBUILD (and another 200+ lines for checksums if 2 checksum algorithms are used). This is currently considered bad style, because the PKGBUILD is supposed to be short, simple and easy to read/understand/audit.

2) The official ISO is meant for installation and maintenance, but does not contain a compiler, and I'm not sure it should. Many of the other base-devel packages are also missing, but since you also need the build dependencies of all the packages you're using (recursively?) this should likely be its own ISO (at which point you could also include the source code however).

3) All of this doesn't take BUILDINFO files into account, you can use Arch Linux as a source-based distro, but if you want exact matches with the official packages you would need to match the compiler version that was used for each respective package.

I did some digging and downloaded the buildinfo files for each package that is present in the archlinux-2024.03.01 iso (using the archlinux-userland-fs-cmp tool) and in total these gcc versions have been used (gcc7 being part of the usb_modeswitch build environment, but I didn't bother investigating why):

gcc7-7.4.1+20181207-3-x86_64
gcc-9.2.0-4-x86_64
gcc-9.3.0-1-x86_64
gcc-10.1.0-1-x86_64
gcc-10.1.0-2-x86_64
gcc-10.2.0-3-x86_64
gcc-10.2.0-4-x86_64
gcc-10.2.0-6-x86_64
gcc-11.1.0-1-x86_64
gcc-11.2.0-4-x86_64
gcc-12.1.0-2-x86_64
gcc-12.2.0-1-x86_64
gcc-12.2.1-1-x86_64
gcc-12.2.1-2-x86_64
gcc-12.2.1-4-x86_64
gcc-13.1.1-1-x86_64
gcc-13.1.1-2-x86_64
gcc-13.2.1-3-x86_64
gcc-13.2.1-4-x86_64
gcc-13.2.1-5-x86_64

And these versions of the Rust compiler:

rust-1:1.74.0-1-x86_64
rust-1:1.75.0-2-x86_64
rust-1:1.76.0-1-x86_64

In total the build environment of all packages consists of 3704 different (pkgname, pkgver) tuples.

If you disregard this, the packages you build with such an ISO wouldn't match the official packages, but 2 groups with the same ISO could likely produce matching binary packages (assuming they have a way to derive a deterministic SOURCE_DATE_EPOCH value from that ISO).

From there on you'd "only" need to bootstrap a path to these binary seeds, but that's also why I pointed out this is more relevant to bootstrappable builds. Using plenty of different gcc versions looks annoying, but is only an issue for bootstrapping, not for reproducible builds (as long as everything is fully documented).

If someday an Electromagnetic Pulse weapon destroys all the running
computers, we'd like to bootstrap the whole industry up again, without
breadboarding 8-bit micros and manually toggling in programs.  Instead,
a chip foundry can take these two ISOs and a bare laptop out of a locked
fire-safe, reboot the (Arch Linux) world from them, and then use that
Linux machine to control the chip-making and chip-testing machines that
can make more high-function chips.  (This would depend on the
chip-makers keeping good offline fireproof backups of their own
application software -- but even if they had that, they can't reboot and
maintain the chip foundry without working source code for their
controller's OS.)

I'm personally not interested in this scenario, I'm aware Allan McRae is looking for funding for pacman development. Maybe somebody could sponsor development of a "build without network" feature in pacman, or support for auto-generated additional sources, like Gentoo or OpenBSD Ports, mentioned above.

http://allanmcrae.com/about/

cheers,
kpcyrd

Reply via email to