On Mon, Apr 1, 2024, at 2:04 PM, Russ Allbery wrote: > "Zack Weinberg" <z...@owlfolio.org> writes: >> It might indeed be worth thinking about ways to minimize the >> difference between the tarball "make dist" produces and the tarball >> "git archive" produces, starting from the same clean git checkout, >> and also ways to identify and audit those differences. > > There is extensive ongoing discussion of this on debian-devel. There's > no real consensus in that discussion, but I think one useful principle > that's emerged that doesn't disrupt the world *too* much is that the > release tarball should differ from the Git tag only in the form of > added files. Any files that are present in both Git and in the release > tarball should be byte-for-byte identical.
That dovetails nicely with something I was thinking about myself. Obviously the result of "make dist" should be reproducible except for signatures; to the extent it isn't already, those are bugs in automake. But also, what if "make dist" produced *two* disjoint tarballs? One of which is guaranteed to be byte-for-byte identical to an archive of the VCS at the release tag (in some clearly documented fashion; AIUI, "git archive" does *not* do what we want). The other contains all the files that "autoreconf -i" or "./bootstrap.sh" or whatever would create, but nothing else. Diffs could be provided for both tarballs, or only for the VCS-archive tarball, whichever turns out to be more compact (I can imagine the diff for the generated-files tarball turning out to be comparable in size to the generated-files tarball itself). This should make it much easier to find, and therefore audit, the pre- generated files, and to validate that there's no overlap. It would add an extra step for people who want to build from tarball, without having to install autoconf (or whatever) first -- but an easier extra step than, y'know, installing autoconf. :) Conversely, people who want to build from tarballs but *not* use the pre-generated configure, etc, could now download the 'bare' tarball only. ("Couldn't those people just build from a git checkout?" Not if they don't have the tooling for it, not during early stages of a distribution bootstrap, etc. Also, the act of publishing a tarball that's a golden copy of the VCS at the release tag is valuable for archival purposes.) zw