Re: Projects that need full Git repositories or upstream tarballs

Russ Allbery Fri, 13 Feb 2026 11:35:42 -0800

Jeremy Stanley <[email protected]> writes:

> It's worth noting that in these projects, humans do not ever, ever push
> commits or tags directly into official repositories. Similarly, machines
> so not ever, ever push commits or tags directly into official
> repositories either without first being reviewed and approved by one or
> more humans. Further, there is always a final automated acceptance
> testing phase which occurs between human approval and machines pushing
> what was reviewed to the official repositories.


> Beyond development of the additional automation and associated
> complexity surface area increase for potential new bugs, the suggested
> workflow change would involve two additional human review checkpoints
> (the commit with the generated files, and the merge commit back into the
> target branch), essentially doubling the amount of human involvement in
> the release process. This is then multiplied across hundreds of projects
> with developers requesting releases at frequent intervals.

I just want to strongly support what Jeremy is saying here.

I am generally convinced that using upstream release tarballs introduces
some audit risk because it provides a less-audited path to insert
malicious code, and I'm also generally convinced that using upstream
signed tags is better when uptsream treats those as equivalent and doesn't
put any special effort into building tarballs. I'm less convinced that
it's always possible; there are some *really convenient* upstream
workflows that generate separate release artifacts from post-processing of
the contents of the Git repository, and I don't think upstreams are going
to stop using them.

Jeremy's workflow is an important one that I think people should think
seriously about. It is very nice to not have to duplicate, in the
repository, information that can be derived from the Git history, tag
state, and related information. It is a logical continuation of the
standard advice to not commit generated files to version control, which
has been standard advice for good reason for as long as I've been involved
in computing and long before the invention of Git.

My day job is Kubernetes-first, so none of the software I work on for work
is packaged for Debian, but I can say from first-hand experience that
versioning based on Git tags, and including that version information only
in release artifacts and not in the repository outside of the tag, is
simply a better workflow for upstream maintenance than committing version
updates to files. It works properly with automated testing and the rest of
the release process in a very satisfying way without creating spurious
commits that would require repository write access be granted by automated
process or might require unnecessary and unwanted branches in
otherwise-simple development flows.

I do not want to defend the very specific techniques that pristine-tar,
specifically, uses to regenerate the tarball. I think pristine-tar is a
bit of a bear on a unicycle: the amazing part isn't that it's good at
riding a unicycle, but that it's riding a unicycle at all. I say that with
deep respect for all the convenience that pristine-tar has given us over
the years, since it really does work for 95% of the things people want to
do with it. But it is riddled with edge cases and requires special support
in other tools, and people have been reporting for just as long as it has
existed that it fails in one situation or another. (In particular, it's
common for it to work today but then fail to reconstruct the tarball with
current tools five or ten years later.)

But the *idea* of having the pristine upstream tarball somewhere is not
necessarily a bad one, and there are other approaches, such as
pristine-lfs, that potentially would work with a Salsa-only development
model. I understand the point about security and the risk of injection of
malicious artifacts, but I don't agree that this necessarily implies
throwing away the upstream tarball. That is *one* approach to avoiding
that risk, and a simple one, but another would be to verify the treewise
reproducibility of the upstream artifact and, if one is able to verify it,
use the upstream artifact (and thus the upstream signatures). This is
harder and more complicated, but it works with workflows that upstream may
find very valuable and be flatly unwilling to abandon.

I think what Jeremy describes is a very good and thoughtful upstream
workflow, and we should not be trying to talk upstreams out of this sort
of thoughtful approach. Instead, we should try to *reproduce* the approach
from the full Git repository so that we can provide independent treewise
verification of the release artifact, which provides equal security
benefit to throwing away the upstream release artifact and using the
signed Git tag, while still preserving the provenance chain to the
upstream release tarball.

I don't think this approach is in any way contrary to the design goals of
either dgit or tag2upload, or impossible to implement in their design
model. It will require work, though, because it's a more complicated
technical workflow that requires doing more verification work. And, to be
clear, I am not volunteering to do the work. But I do want to stand up and
say that I think that work would be valuable, and it gives both us and
upstream something useful that we do not get from only working from Git
tags.

-- 
Russ Allbery ([email protected])              <https://www.eyrie.org/~eagle/>

Re: Projects that need full Git repositories or upstream tarballs

Reply via email to