On Sat Aug 2, 2025 at 9:17 PM CEST, Ian Jackson wrote:
I've come back from a party and am a bit tipsy so I will read this
properly later, but:

Thanks for engaging with these questions!

I'm replying to your email after a small party too, but at least I have slept a couple of hours :)

I think in principle it might be a .sig.

Maybe yes, but regardless of the input signature filename, pristine-tar always stores the signature in Git with a name of orig.asc. Also, doesn't dpkg-source look for .asc files only?

So the .id contains the tree (git tree object) which uniquely identifies the *contents* of the tarball.

Yes, but see below.

But how does the pristine-tar information specify the precise hash of the tarball itself? Does the .delta file say what the output hash is supposed to be ?

Yes, I've checked now and the .delta contains the expected SHA256 hash.

I don't think I fully understand the implications. My default position is that the answer should be "no" unless one of us *does* understand the implications :-).

One "innocuous" example which I don't see issues allowing is one where the orig tarball contains empty dirs, which are not representable in Git. As an example:

   $ tar -xvzf mypackage_1.0.orig.tar.gz
   mypackage/
   mypackage/file.txt
   mypackage/empty_dir/

   $ cd mypackage

   $ git init -b upstream/latest

   $ git add --all

   $ git commit -m init

   $ git show pristine-tar:mypackage_1.0.orig.tar.gz.id | xargs git show
   tree 385d33e969fefd23b8efaca69c1d2db507ce0daf

   file.txt

   $ pristine-tar commit ../mypackage_1.0.orig.tar.gz upstream/latest

   $ rm ../mypackage_1.0.orig.tar.gz

   $ pristine-tar --debug checkout mypackage_1.0.orig.tar.gz
   pristine-tar: set subdir to mypackage
   pristine-tar: subdir is mypackage
   pristine-tar: mypackage/empty_dir/ is listed in the manifest but may not be 
present in the source directory
   pristine-tar: creating missing mypackage/empty_dir/
   pristine-tar: doing full tree sweep to catch missing files
   pristine-tar: successfully generated mypackage_1.0.orig.tar.gz

   $ tar -tzf mypackage_1.0.orig.tar.gz
   mypackage/
   mypackage/file.txt
   mypackage/empty_dir/

One different example which may illustrates the "unexpected" results which this could lead to is this one. Here, the tarball is created with a file containing "evil" content, while in the upstream/latest branch only the "good" content is stored. Upon tarball checkout, the good content gets replaced with the evil one:

   $ mkdir repo

   $ echo evil > repo/file.txt

   $ tar -czf repo_1.0.orig.tar.gz repo

   $ echo good > repo/file.txt

   $ cd repo

   $ git init -b upstream/latest

   $ git add --all

   $ git commit -m init

   $ pristine-tar commit ../repo_1.0.orig.tar.gz upstream/latest

   $ git show pristine-tar:repo_1.0.orig.tar.gz.id
   ca1cc63dd18610bc64a150397556d33e850a61e8

   $ git rev-parse --verify --end-of-options 'upstream/latest^{tree}'
   ca1cc63dd18610bc64a150397556d33e850a61e8

   $ git show ca1cc63dd18610bc64a150397556d33e850a61e8:file.txt
   good

   $ rm ../repo_1.0.orig.tar.gz

   $ pristine-tar checkout repo_1.0.orig.tar.gz

   $ tar -xvzf repo_1.0.orig.tar.gz
   repo/
   repo/file.txt

   $ cat repo/file.txt
   evil

Even though both the pristine-tar .id file and the upstream/latest branch point to the same tree id, the binary .delta contains modifications to file.txt which change the contents from "good" (stored in the git tree) to "evil" upon orig checkout.

Even though this example is artificial (the tarball contents are usually committed to version control after it has been downloaded, not before), it would still theoretically be possible for a malicious maintainer to sneak a backdoor in (like in the xz backdoor case, but with the extra step of also having a Debian maintainer collaborate). So I'm inclined to say "sorry, no, this is too dangerous".

It is also true that this is currently allowed in regular Salsa repos, so allowing this would not really make the situation worse.

The thing is: how do we disallow this? I'm not aware of any pristine-tar switch which makes it fail when such .delta file performing file content modifications exists. Do we have to perform our own checking *after* the tarball is checked out, by e.g. extracting it again on top of the upstream commit tree and making sure no differences exist? Hacky but may work.

Let me know! Bye :)

Reply via email to