Hi!

On Wed, 2024-04-03 at 23:53:56 +0100, James Addison wrote:
> On Wed, 3 Apr 2024 19:36:33 +0200, Guillem wrote:
> > On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> > > On 2024-03-29 22:41, Guillem Jover wrote:
> > > I think with my upstream hat on I'd rather ship a clear manifest (checked
> > > into Git) that tells distributions which files in the distribution tarball
> > > are build artifacts, and guarantee that if you delete all of those files,
> > > the remaining tree should be byte-for-byte identical with the
> > > corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> > > Then I can continue to ship only one release artifact.
> >
> > I've been pondering about this and I think I might have come up with a
> > protocol that to me (!) seems safe, even against a malicious upstream. And
> > does not require two tarballs which as you say seems cumbersome, and makes
> > it harder to explain to users. But I'd like to run this through the list
> > in case I've missed something obvious.
> 
> Does this cater for situations where part of the preparation of a source
> tarball involves populating a directory with a list of filenames that
> correspond to hostnames known to the source preparer?
> 
> If that set of hostnames changes, then regardless of the same source
> VCS checkout being used, the resulting distribution source tarball could
> differ.

> Yes, it's a hypothetical example; but given time and attacker patience,
> someone is motivated to attempt any workaround.  In practice the
> difference could be a directory of hostnames or it could be a bitflag
> that is part of a macro that is only evaluated under various nested
> conditions.

I'm not sure whether I've perhaps misunderstood your scenario, but if
the distributed tarball contains things not present in the VCS, then
with this proposal those can then be easily removed, which means it
does not matter much if they differ between same tarball generation
(I mean it matters in the sense that it's an alarm sign, but it does
not matter in the sense that you can get at the same state as with a
clean VCS checkout).

The other part then is whether the remaining contents differ from what
is in the VCS.

If any of these trigger a difference, then that would require manual
review. That of course does not exempt one from reviewing the VCS, it
just potentially removes one avenue for smuggling artifacts.

> To take a leaf from the Reproducible Builds[1] project: to achieve a
> one-to-one mapping between a set of inputs and an output, you need to
> record all of the inputs; not only the source code, but also the build
> environment.
> 
> I'm not yet convinced that source-as-was-written to distributed-source-tarball
> is a problem that is any different to that of distributed-source-tarball to
> built-package.  Changes to tooling do, in reality, affect the output of
> build processes -- and that's usually good, because it allows for
> performance optimizations.  But it also necessitates the inclusion of the
> toolchain and environment to produce repeatable results.

In this case, the property you'd gain is that you do not need to trust
the system of the person preparing the distribution tarball, and can
then regenerate those outputs from (supposedly) good inputs from both
the distribution tarball, and _your_ (or the distribution) system
toolchain.

The distinction I see from the reproducible build effort, is that in
this case we can just discard some of the inputs and outputs and go
from original sources.

(Not sure whether that clarifies or I've talked past you now. :)

Thanks,
Guillem

Reply via email to