Jeremy Stanley <[email protected]> writes: > It's worth noting that in these projects, humans do not ever, ever push > commits or tags directly into official repositories. Similarly, machines > so not ever, ever push commits or tags directly into official > repositories either without first being reviewed and approved by one or > more humans. Further, there is always a final automated acceptance > testing phase which occurs between human approval and machines pushing > what was reviewed to the official repositories.
> Beyond development of the additional automation and associated > complexity surface area increase for potential new bugs, the suggested > workflow change would involve two additional human review checkpoints > (the commit with the generated files, and the merge commit back into the > target branch), essentially doubling the amount of human involvement in > the release process. This is then multiplied across hundreds of projects > with developers requesting releases at frequent intervals. I just want to strongly support what Jeremy is saying here. I am generally convinced that using upstream release tarballs introduces some audit risk because it provides a less-audited path to insert malicious code, and I'm also generally convinced that using upstream signed tags is better when uptsream treats those as equivalent and doesn't put any special effort into building tarballs. I'm less convinced that it's always possible; there are some *really convenient* upstream workflows that generate separate release artifacts from post-processing of the contents of the Git repository, and I don't think upstreams are going to stop using them. Jeremy's workflow is an important one that I think people should think seriously about. It is very nice to not have to duplicate, in the repository, information that can be derived from the Git history, tag state, and related information. It is a logical continuation of the standard advice to not commit generated files to version control, which has been standard advice for good reason for as long as I've been involved in computing and long before the invention of Git. My day job is Kubernetes-first, so none of the software I work on for work is packaged for Debian, but I can say from first-hand experience that versioning based on Git tags, and including that version information only in release artifacts and not in the repository outside of the tag, is simply a better workflow for upstream maintenance than committing version updates to files. It works properly with automated testing and the rest of the release process in a very satisfying way without creating spurious commits that would require repository write access be granted by automated process or might require unnecessary and unwanted branches in otherwise-simple development flows. I do not want to defend the very specific techniques that pristine-tar, specifically, uses to regenerate the tarball. I think pristine-tar is a bit of a bear on a unicycle: the amazing part isn't that it's good at riding a unicycle, but that it's riding a unicycle at all. I say that with deep respect for all the convenience that pristine-tar has given us over the years, since it really does work for 95% of the things people want to do with it. But it is riddled with edge cases and requires special support in other tools, and people have been reporting for just as long as it has existed that it fails in one situation or another. (In particular, it's common for it to work today but then fail to reconstruct the tarball with current tools five or ten years later.) But the *idea* of having the pristine upstream tarball somewhere is not necessarily a bad one, and there are other approaches, such as pristine-lfs, that potentially would work with a Salsa-only development model. I understand the point about security and the risk of injection of malicious artifacts, but I don't agree that this necessarily implies throwing away the upstream tarball. That is *one* approach to avoiding that risk, and a simple one, but another would be to verify the treewise reproducibility of the upstream artifact and, if one is able to verify it, use the upstream artifact (and thus the upstream signatures). This is harder and more complicated, but it works with workflows that upstream may find very valuable and be flatly unwilling to abandon. I think what Jeremy describes is a very good and thoughtful upstream workflow, and we should not be trying to talk upstreams out of this sort of thoughtful approach. Instead, we should try to *reproduce* the approach from the full Git repository so that we can provide independent treewise verification of the release artifact, which provides equal security benefit to throwing away the upstream release artifact and using the signed Git tag, while still preserving the provenance chain to the upstream release tarball. I don't think this approach is in any way contrary to the design goals of either dgit or tag2upload, or impossible to implement in their design model. It will require work, though, because it's a more complicated technical workflow that requires doing more verification work. And, to be clear, I am not volunteering to do the work. But I do want to stand up and say that I think that work would be valuable, and it gives both us and upstream something useful that we do not get from only working from Git tags. -- Russ Allbery ([email protected]) <https://www.eyrie.org/~eagle/>

