On 4/3/24 4:21 AM, Adrian Bunk wrote:
On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:
...
I figured out a somewhat straight-forward way to check if a given `git
archive` output is cryptographically claimed to be the source input of a
given binary package in either Arch Linux or Debian (or both).

For Debian the proper approach would be to copy Checksums-Sha256 for the
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.

I believe this to be the "reproducible source tarball" thing some people
have been asking about.
...

The lack of a reliably reproducible checksum when using "git archive" is
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag
and with the same commit ID downloaded at different times might have
different checksums.

Granted it takes some skill to take snapshots that match what github is generating (and there are occasional issues) but generally speaking it works quite well. The required command is in the README, and I encourage you to give it a try.

If you want something that's explicitly designed for taking reproducible VCS snapshots you could also consider the "Nix Archive" format[0], however I think more people would be in favor of agreeing on how to canonically derive a given git tree into a `.tar.gz` (or at least .tar) instead of switching Debian to the .nar file format.

[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that it may only work in 98% of cases, I'd say, is a Luxusproblem considering the current state of things. The next paragraph is the bigger headache:

This tool highlights the concept of "canonical sources", which is supposed
to give guidance on what to code review.
...

How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.

It doesn't. It works like a one-way function, it can verify a given VCS snapshot is definitely the source code that was ingested into Debian, but it can't locate the source code on its own.

I don't know if Debian has this kind of provenance information available, to my knowledge, Debian operates on "our maintainers upload .tar.xz files into our archive and we take them for face value". Which does make sense, considering not every software project uses git, some may develop their own VCS, some software projects do not have any VCS at all and it's just one person applying patches to a folder on their local computer and uploading .tar snapshots to a webserver every other month.

There's some packages that have some kind of system behind them, like rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to match <https://crates.io/api/v1/crates/toml/0.5.11/download> (although sometimes files get excluded from the tar upload). I'd like to explicitly encourage people to point me in the right direction if there's any existing effort of mapping debian .orig.tar.gz files to git tags (not necessarily bit-for-bit, but at least which commit we expect it to come from).

https://github.com/kpcyrd/backseat-signed

The README
...

"This requires some squinting since in Debian the source tarball is
  commonly recompressed so only the inner .tar is compared"

This doesn't sound true.

I've updated the wording and intend to investigate this further. By default the relevant command even expects an exact match. For example this works:

```
% backseat-signed plumbing debian-tarball-from-sources --sources Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz [2024-04-04T18:45:09Z INFO backseat_signed::plumbing] Loading sources index from "Sources.xz" [2024-04-04T18:45:10Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.gz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO backseat_signed::plumbing] File verified successfully
```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz [2024-04-04T18:48:32Z INFO backseat_signed::plumbing] Loading sources index from "Sources.xz" [2024-04-04T18:48:33Z INFO backseat_signed::plumbing] Loading file from "cmatrix_2.0.orig.tar.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find source tarball with matching hash in source index
```

Being able to disregard the compression layer is still necessary however, because Debian (as far as I know) never takes the hash of the inner .tar file but only the compressed one. Because of this you may still need to provide `--orig <path>` if you want to compare with an uncompressed tar.

Here's an example of how you'd verify vim_9.1.0199.orig.tar.xz in Debian was taken from `https://github.com/vim/vim#tag=v9.1.0199`:

```
% git clone --branch v9.1.0199 https://github.com/vim/vim
% git -C vim rev-parse HEAD
ad38769030b5fa86aa0e8f1f0b4266690dfad4c9
% git -C vim archive --prefix="vim-9.1.0199/" -o vim-9.1.0199.tar v9.1.0199
% sha256sum vim-9.1.0199.tar
166f319a31a4eada3d181d80780f8581b11cf6fac61e57e73ef26a1e183eaed0 vim-9.1.0199.tar
```

Take Sources.xz from here:

https://snapshot.debian.org/archive/debian/20240324T210425Z/dists/sid/main/source/Sources.xz

sha256:ba14ca35563ace9dc1e81446f6d72979cdc5aa7ea5c558cb0fe5071736c602b2

And vim_9.1.0199.orig.tar.xz from here:

https://snapshot.debian.org/archive/debian/20240324T210425Z/pool/main/v/vim/vim_9.1.0199.orig.tar.xz

sha256:a3284e44b55a7877f3b0bbb1b0a349748e3b48f9d1e1c9d0f93856f7be417dda

You can verify it all checks out like this:

```
% backseat-signed plumbing debian-tarball-from-sources --sources Sources.xz --orig vim_9.1.0199.orig.tar.xz --name vim vim-9.1.0199.tar [2024-04-04T19:09:40Z INFO backseat_signed::plumbing] Loading sources index from "Sources.xz" [2024-04-04T19:09:41Z INFO backseat_signed::plumbing] Loading file from "vim-9.1.0199.tar" [2024-04-04T19:09:41Z INFO backseat_signed::plumbing] Loading Debian .orig.tar from "vim_9.1.0199.orig.tar.xz"
[2024-04-04T19:09:42Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T19:09:42Z INFO backseat_signed::plumbing] File verified successfully
```

Tada.

Of course there's also a subcommand to check a given Sources.xz belongs to a given Release/Release.gpg combination. There's no support for InRelease yet.

The tool wasn't able to take .tar directly before. I just built this.
Just for you. 🖤

I've checked both, upstreams github release page and their website[1], but couldn't find any mention of .tar.xz, so I think my claim of Debian doing the compression is fair.

[1]: https://www.vim.org/download.php

cheers,
kpcyrd

Reply via email to