John Gilmore: > kpcyrd <kpc...@archlinux.org> wrote: >> 1) There's currently no way to tell if a package can be built offline >> (without trying yourself). > > Packages that can't be built offline are not reproducible, by > definition. They depend on outside events and circumstances > in order for a third party to reproduce them successfully. > > So, fixing that in each package would be a prerequisite to making a > reproducible Arch distro (in my opinion).
I don't agree. For example the r-b.o [1] definition doesn't mandate who needs to archive what. We probably can agree that we mean a "verifiable path from source to binary code" (and not just repeatability, which is also sometimes meant by reproducible builds in other contexts), but beyond that the details and motivations will be different depending on who you ask. To be clear I don't say what you like to see is not worthwhile. Actually I'm very sympathic to such archiving goals. But if Arch Linux, as kpcyrd's mails suggest, right now just want to verify their builder output soon-ish after upload that's fine too and can be called reproducible, in my opinion. [1]: https://reproducible-builds.org/docs/definition/ > I don't understand why a "source tree" would store a checksum of a > source tarball or source file, rather than storing the actual source > tarball or source file. You can't compile a checksum. How distros store their source code is different, due to different needs, historic circumstances, etc.. And the approach of just having the packaging definition and patches and then referring the "original" source is common and I certainly see the advantages. > kpcyrd <kpc...@archlinux.org> wrote: >> Specifically Gentoo and OpenBSD Ports have solutions for this that I >> really like, they store a generated list of URLs along with a >> cryptographic checksum in a separate file, which includes crates >> referenced in e.g. a project's Cargo.lock. > > I don't know what a crate or a Cargo.lock is, It's Cargo's (Rust' package/dependency manager) way to pin specific dependencies, including hashes of those. > but rather than fix the problem at its source (include the source > files), you propose to add another complex circumvention alongside the > existing package building infrastructure? What is the advantage of > that over merely doing the "cargo fetch" early rather than late and > putting all the resulting source files into the Arch source package? I'm not an Arch developer, but probably because a package source repo like [2] is much easier for them to handle than if they would commit the source of all (transitive) dependencies [3]. [2]: https://gitlab.archlinux.org/archlinux/packaging/packages/rage-encryption [3]: https://github.com/str4d/rage/blob/v0.10.0/Cargo.lock (Note that Arch made the, for a classic Linux distro currently rather unusual, decision to build Rust programs with the exact dependencies upstream has defined and not separately package those libraries.) >> 3) All of this doesn't take BUILDINFO files into account > > The BUILDINFO files are part of the source distribution needed > to reproduce the binary distribution. So they would go on the > source ISO image. > >> I did some digging and downloaded the buildinfo files for each package >> that is present in the archlinux-2024.03.01 iso > > Thank you for doing that digging! > >> Using plenty of different gcc versions looks >> annoying, but is only an issue for bootstrapping, not for reproducible >> builds (as long as everything is fully documented). > > I agree that it's annoying. It compounds the complexity of reproducing > the build. Does Arch get some benefit from doing so? > > Ideally, a binary release ISO would be built with a single set of > compiler tools. Why is Arch using a dozen compiler versions? Just to > avoid rebuilding binary packages once the binary release's engineers > decide what compiler is going to be this release's gold-standard > compiler? (E.g. The one that gets installed when the user runs pacman > to install gcc.) Or do the release-engineers never actually standardize > on a compiler -- perhaps new ones get thrown onto some server whenever > someone likes, and suddenly all the users who install a compiler just > start using that one? If you look at classic Linux distros it's the norm to iteratively add packages to your repo and build new packages with what is in the (development) repo at this time. So a single snapshot of a repo will in nearly all cases not contain all versions to reproduce the packages in that snapshot. You will find this in Arch, Debian, Fedora, .... Some other distros like Yocto, might make different decisions, but those are rather the exception. > It currently seems that there is no guarantee that on day X, if you > install gcc on Arch (from the Internet) and on the same day you pull in > the source code of pacman package Y, that it will even build with the > Day X version of gcc. Is that true? As described above for the rolling development repo of most distros that's true. > [from a previous mail:] > If someday an Electromagnetic Pulse weapon destroys all the running > computers, we'd like to bootstrap the whole industry up again, without > breadboarding 8-bit micros and manually toggling in programs. Instead, > a chip foundry can take these two ISOs and a bare laptop out of a locked > fire-safe, reboot the (Arch Linux) world from them, and then use that > Linux machine to control the chip-making and chip-testing machines that > can make more high-function chips. (This would depend on the > chip-makers keeping good offline fireproof backups of their own > application software -- but even if they had that, they can't reboot and > maintain the chip foundry without working source code for their > controller's OS.) In such a case reproducible builds in the sense of ensuring that you can be sure that a binary matches the source is actually not important (But can be convenient to check your recovery build environment). Simon
OpenPGP_signature.asc
Description: OpenPGP digital signature