Re: "Reproducible build" definition in OpenSSF glossary

James Addison via rb-general Wed, 02 Jul 2025 02:17:17 -0700

On Wed, Jul 2, 2025, 09:39 Simon Josefsson <[email protected]> wrote:


> James Addison <[email protected]> writes:
>
> >> > I think it's pretty much impossible to avoid people 'maliciously'
> >> > misrepresenting their product as 'reproducible' when it really isn't -
> >> > that doesn't seem like something tweaking the definition can fix. I do
> >> > think it's still useful to have precise terms so we can make the
> >> > distinctions clear, though.
> >> >
> >> > I agree "Reproducible Builds" as a whole means "from source to
> binary".
> >>
> >> That would make Debian installer CDs impossible to call reproducible,
> >> since they are built from binaries for which we do not have source code.
> >>
> >> [ ... snip ... ]
> >
> >
> > Does this refer to binary firmware specifically?
>
> I don't know.
>
> There are other corner cases: some existing binary debian packages where
> built using earlier versions of debian packages, and this recurse back
> to old versions of debian packages, some of them may have never even
> been in any official debian release.  Some necessary package may be
> ancient and removed even from archive.debian.org and only exists on
> snapshot.debian.org.  Both systems have policies in place to remove
> packages when various issues are identified (see
> https://snapshot.debian.org/removal/ for list), so these are known to
> not be complete historic records of what was ever published.  Some of
> those no longer public packages may be necessary to rebuild some old
> package that in turn through some chain of build dependency is needed to
> rebuild what we use today.
>
> I don't know if anyone did a transitive trace of what packages are
> necessary to rebuild all of modern Debian, does anyone know?  I know of
> https://rebuilder-snapshot.debian.net/ which was an effort to publish
> all packages necessary to rebuild a 'stable' release, but it didn't
> include the transitive closure of that set of packages.
>
> This analysis needs to be done for each architecture too.  I fear even
> this analysis will be insufficient: when bootstrapping a new
> architecture, I think people historically have created fake packages
> used to boostrap things.  So you can't rely fully on the PACKAGE:VERSION
> to refer to the package that was actually used.
>
> Another problem is that the PACKAGE:VERSION mapping used by Debian
> packages does not easily or uniquely map to a strong cryptographic hash
> checksum of the original package binary.  You quickly need to rely on
> weak SHA1 identifiers, and I recall there has been multiple valid
> versions of the same binary (due to security.debian.org rebuilds or
> something like that).
>
> > I would hope that we could agree that building an artifact composed
> partly
> > or entirely from 100% DFSG binary packages that are themselves
> reproducible
> > would produce a transitively reproducible build.
>
> That depends on how you define "reproducible"...  I think most people
> here doesn't consider it required to rebuild the transitive closure of
> build dependencies to call something reproducible.  So in that case I
> don't think your statement is necessarily true.  Also consider the case
> with removed packages due to different DFSG interpretation (or
> definition) that changed over the years.
>
> > For closed-source binary firmware blobs, the situation does seem less
> > clear.  They arguably can be used as fixed inputs to a build to achieve
> > identical bit-for-bit output -- but if I understand correctly, it raises
> a
> > question of "is complete source code to all inputs required in order to
> > label/certify an artifact as reproducibly buildable?".
>
> Indeed.
>
> And if we don't have source code for object X, how can we tell that it
> is a firmware blob?  There is no simple way to know what it is, and some
> methods to establish what it is (disassembly) may be illegal.
>
> > I'm not initially sure how/whether an exception clause could be written
> to
> > allow binary inputs under some circumstances, without reducing the
> > effectiveness of the definition (because, for example, copying an
> entirely
> > opaque blob from one directory to another could be argued as within such
> a
> > redefinition).
>
> Agreed!  In my mind, there is a way out of that dilemma:
>
> 1) One term, e.g., "recreatable", to cover the situation where you don't
> have source code for the transitive closure, and don't require
> rebuilding of that source code, of the set of build dependencies.  This
> leads to a degenerative build process of 'cp FOO BAR' to create a
> "recreatable" artifact.  The Debian LiveCD would fall into this
> category.
>
> 2) Another term, e.g., "reproducible" to cover the situation where ALL
> source code for ALL build dependencies including their build
> dependencies, and so on, are available and used to recreate the
> bit-by-bit identical artifact.
>
> The problem is that many people seem to use the term "reproducible" to
> mean 1) today, so if we settle on these definitions, there will still be
> ambiguity unless everyone adopts the new definitions.
>
> There are at least two ways to reach 2) for an OS: bootstrappable builds
> or idempotent builds.  Guix show a bootstrappable build is feasible, I
> don't know anyone testing idempotent builds of any OS.
>
> /Simon
>

Thank you - my initial feeling is that creating a similar parallel
definition would be likely to divide contribution/community, and I feel
skeptical and opposed to that.

A stricter definition has the benefit that known deficiencies can be
described and tracked like any other form of software bug.

(I have realized that I have replied off-list and that my messages are
pending moderation -- this is bad etiquette by me I think.  I will reply to
this one message because I would like to participate, but then I'll
probably pause)

Regards,
James

>

Re: "Reproducible build" definition in OpenSSF glossary

Reply via email to