On Sat Aug 2, 2025 at 9:17 PM CEST, Ian Jackson wrote:
I've come back from a party and am a bit tipsy so I will read this
properly later, but:
Thanks for engaging with these questions!
I'm replying to your email after a small party too, but at least I have
slept a couple of hours :)
I think in principle it might be a .sig.
Maybe yes, but regardless of the input signature filename, pristine-tar
always stores the signature in Git with a name of orig.asc. Also,
doesn't dpkg-source look for .asc files only?
So the .id contains the tree (git tree object) which uniquely
identifies the *contents* of the tarball.
Yes, but see below.
But how does the pristine-tar information specify the precise hash of
the tarball itself? Does the .delta file say what the output hash is
supposed to be ?
Yes, I've checked now and the .delta contains the expected SHA256 hash.
I don't think I fully understand the implications. My default
position is that the answer should be "no" unless one of us *does*
understand the implications :-).
One "innocuous" example which I don't see issues allowing is one where
the orig tarball contains empty dirs, which are not representable in
Git. As an example:
$ tar -xvzf mypackage_1.0.orig.tar.gz
mypackage/
mypackage/file.txt
mypackage/empty_dir/
$ cd mypackage
$ git init -b upstream/latest
$ git add --all
$ git commit -m init
$ git show pristine-tar:mypackage_1.0.orig.tar.gz.id | xargs git show
tree 385d33e969fefd23b8efaca69c1d2db507ce0daf
file.txt
$ pristine-tar commit ../mypackage_1.0.orig.tar.gz upstream/latest
$ rm ../mypackage_1.0.orig.tar.gz
$ pristine-tar --debug checkout mypackage_1.0.orig.tar.gz
pristine-tar: set subdir to mypackage
pristine-tar: subdir is mypackage
pristine-tar: mypackage/empty_dir/ is listed in the manifest but may not be
present in the source directory
pristine-tar: creating missing mypackage/empty_dir/
pristine-tar: doing full tree sweep to catch missing files
pristine-tar: successfully generated mypackage_1.0.orig.tar.gz
$ tar -tzf mypackage_1.0.orig.tar.gz
mypackage/
mypackage/file.txt
mypackage/empty_dir/
One different example which may illustrates the "unexpected" results
which this could lead to is this one. Here, the tarball is created with
a file containing "evil" content, while in the upstream/latest branch
only the "good" content is stored. Upon tarball checkout, the good
content gets replaced with the evil one:
$ mkdir repo
$ echo evil > repo/file.txt
$ tar -czf repo_1.0.orig.tar.gz repo
$ echo good > repo/file.txt
$ cd repo
$ git init -b upstream/latest
$ git add --all
$ git commit -m init
$ pristine-tar commit ../repo_1.0.orig.tar.gz upstream/latest
$ git show pristine-tar:repo_1.0.orig.tar.gz.id
ca1cc63dd18610bc64a150397556d33e850a61e8
$ git rev-parse --verify --end-of-options 'upstream/latest^{tree}'
ca1cc63dd18610bc64a150397556d33e850a61e8
$ git show ca1cc63dd18610bc64a150397556d33e850a61e8:file.txt
good
$ rm ../repo_1.0.orig.tar.gz
$ pristine-tar checkout repo_1.0.orig.tar.gz
$ tar -xvzf repo_1.0.orig.tar.gz
repo/
repo/file.txt
$ cat repo/file.txt
evil
Even though both the pristine-tar .id file and the upstream/latest
branch point to the same tree id, the binary .delta contains
modifications to file.txt which change the contents from "good" (stored
in the git tree) to "evil" upon orig checkout.
Even though this example is artificial (the tarball contents are usually
committed to version control after it has been downloaded, not before),
it would still theoretically be possible for a malicious maintainer to
sneak a backdoor in (like in the xz backdoor case, but with the extra
step of also having a Debian maintainer collaborate). So I'm inclined to
say "sorry, no, this is too dangerous".
It is also true that this is currently allowed in regular Salsa repos,
so allowing this would not really make the situation worse.
The thing is: how do we disallow this? I'm not aware of any pristine-tar
switch which makes it fail when such .delta file performing file content
modifications exists. Do we have to perform our own checking *after* the
tarball is checked out, by e.g. extracting it again on top of the
upstream commit tree and making sure no differences exist? Hacky but may
work.
Let me know! Bye :)