On Wed, Jul 2, 2025, 09:39 Simon Josefsson <[email protected]> wrote:
> James Addison <[email protected]> writes: > > >> > I think it's pretty much impossible to avoid people 'maliciously' > >> > misrepresenting their product as 'reproducible' when it really isn't - > >> > that doesn't seem like something tweaking the definition can fix. I do > >> > think it's still useful to have precise terms so we can make the > >> > distinctions clear, though. > >> > > >> > I agree "Reproducible Builds" as a whole means "from source to > binary". > >> > >> That would make Debian installer CDs impossible to call reproducible, > >> since they are built from binaries for which we do not have source code. > >> > >> [ ... snip ... ] > > > > > > Does this refer to binary firmware specifically? > > I don't know. > > There are other corner cases: some existing binary debian packages where > built using earlier versions of debian packages, and this recurse back > to old versions of debian packages, some of them may have never even > been in any official debian release. Some necessary package may be > ancient and removed even from archive.debian.org and only exists on > snapshot.debian.org. Both systems have policies in place to remove > packages when various issues are identified (see > https://snapshot.debian.org/removal/ for list), so these are known to > not be complete historic records of what was ever published. Some of > those no longer public packages may be necessary to rebuild some old > package that in turn through some chain of build dependency is needed to > rebuild what we use today. > > I don't know if anyone did a transitive trace of what packages are > necessary to rebuild all of modern Debian, does anyone know? I know of > https://rebuilder-snapshot.debian.net/ which was an effort to publish > all packages necessary to rebuild a 'stable' release, but it didn't > include the transitive closure of that set of packages. > > This analysis needs to be done for each architecture too. I fear even > this analysis will be insufficient: when bootstrapping a new > architecture, I think people historically have created fake packages > used to boostrap things. So you can't rely fully on the PACKAGE:VERSION > to refer to the package that was actually used. > > Another problem is that the PACKAGE:VERSION mapping used by Debian > packages does not easily or uniquely map to a strong cryptographic hash > checksum of the original package binary. You quickly need to rely on > weak SHA1 identifiers, and I recall there has been multiple valid > versions of the same binary (due to security.debian.org rebuilds or > something like that). > > > I would hope that we could agree that building an artifact composed > partly > > or entirely from 100% DFSG binary packages that are themselves > reproducible > > would produce a transitively reproducible build. > > That depends on how you define "reproducible"... I think most people > here doesn't consider it required to rebuild the transitive closure of > build dependencies to call something reproducible. So in that case I > don't think your statement is necessarily true. Also consider the case > with removed packages due to different DFSG interpretation (or > definition) that changed over the years. > > > For closed-source binary firmware blobs, the situation does seem less > > clear. They arguably can be used as fixed inputs to a build to achieve > > identical bit-for-bit output -- but if I understand correctly, it raises > a > > question of "is complete source code to all inputs required in order to > > label/certify an artifact as reproducibly buildable?". > > Indeed. > > And if we don't have source code for object X, how can we tell that it > is a firmware blob? There is no simple way to know what it is, and some > methods to establish what it is (disassembly) may be illegal. > > > I'm not initially sure how/whether an exception clause could be written > to > > allow binary inputs under some circumstances, without reducing the > > effectiveness of the definition (because, for example, copying an > entirely > > opaque blob from one directory to another could be argued as within such > a > > redefinition). > > Agreed! In my mind, there is a way out of that dilemma: > > 1) One term, e.g., "recreatable", to cover the situation where you don't > have source code for the transitive closure, and don't require > rebuilding of that source code, of the set of build dependencies. This > leads to a degenerative build process of 'cp FOO BAR' to create a > "recreatable" artifact. The Debian LiveCD would fall into this > category. > > 2) Another term, e.g., "reproducible" to cover the situation where ALL > source code for ALL build dependencies including their build > dependencies, and so on, are available and used to recreate the > bit-by-bit identical artifact. > > The problem is that many people seem to use the term "reproducible" to > mean 1) today, so if we settle on these definitions, there will still be > ambiguity unless everyone adopts the new definitions. > > There are at least two ways to reach 2) for an OS: bootstrappable builds > or idempotent builds. Guix show a bootstrappable build is feasible, I > don't know anyone testing idempotent builds of any OS. > > /Simon > Thank you - my initial feeling is that creating a similar parallel definition would be likely to divide contribution/community, and I feel skeptical and opposed to that. A stricter definition has the benefit that known deficiencies can be described and tracked like any other form of software bug. (I have realized that I have replied off-list and that my messages are pending moderation -- this is bad etiquette by me I think. I will reply to this one message because I would like to participate, but then I'll probably pause) Regards, James >
