Hi,

Assisted by AI, I generated the summary of the discussion below. I
reviewed it and it matches my own general understanding of the
discussion.

It was the result of several successive prompts, not a single one, and I
did not save them (I should have). I chose not to do any manual editing
to fix more minor inaccuracies, so this can also serve as a fair example
of what could be achieved semi-automatically.

-----

This document summarizes the discussions regarding the integration of Git 
metadata into Debian source packages, the "Git vs. Tarball" debate, and the 
auditing of the software supply chain.

## Part I: Architecture & The "Git vs. Archive" Debate

### 1.1 The Core Proposal vs. `tag2upload`

**Otto Kekäläinen** initiated the discussion by proposing the addition of 
`Git-Commit-Id` and `Git-Tree-Id` fields to `.changes` (and potentially `.dsc`) 
files. The goal was to allow maintainers to explicitly declare which Git commit 
a package upload corresponds to, facilitating audits of the software supply 
chain.

Otto argues for explicit metadata to capture the maintainer's *intent* 
regarding the source commit.
> *Ref:* `<caou6taa3ruzuc3jkhab0tzofptwwwdvzqceiy6cowriqk5r...@mail.gmail.com>`

**Ian Jackson** (for the `tag2upload` team) argues that manual fields provide 
"attestation without proof." He advocates for `tag2upload`, which uses signed 
Git tags to drive the upload, creating a cryptographic chain of trust. He 
asserts that `Git-Tag-Info` fields are reserved for this automated protocol.
> *Ref:* `<[email protected]>`

**Sean Whitton** clarifies that the `Git-Tag-*` fields in Policy are 
specifically for the `tag2upload` protocol to link an automated upload back to 
the human signer.
> *Ref:* `<[email protected]>`

### 1.2 Traceability: Tar Archives vs. Git Repositories

A fundamental debate occurred regarding which format—tarballs or Git 
repositories—is better suited for long-term project stability and auditing.

**Arguments for Tarballs**:
*   **Simon Richter** argued that tarballs are stable, mirrorable artifacts. 
Unlike Git repositories, which are "moving targets," a signed tarball is a 
static file that can be easily distributed via Debian's existing mirror network 
and preserved for decades.
    > *Ref:* `<[email protected]>`
*   **Salvo Tomaselli** noted that users appreciate the uniformity of `apt 
source`, which abstracts away the peculiarities of individual Git repository 
structures.
    > *Ref:* 
`<CAOOnQAfpY6LZDTZhzx9QqE=ys2lgpffzki0xa9rqghybcr7...@mail.gmail.com>`

**Arguments for Git**:
*   **Ian Jackson** contended that tarballs are "intermediate build products" 
rather than true source code. He argued that the Git tree is the only format 
that provides the high-resolution history necessary for modern auditing.
    > *Ref:* `<[email protected]>`
*   **The "Smuggling" Critique**: Ian Jackson described the effort to make 
tarballs bit-identical to Git via `pristine-tar` as an unnecessarily complex 
way to "smuggle" a Git commit ID through a legacy format.
    > *Ref:* `<[email protected]>`

### 1.3 Git Bundles as Alternative Artifacts

A discussion emerged about replacing the source tarball entirely with a 
different file-based Git format.

**The Proposal**:
*   **Simon Richter** suggested uploading **git bundles** (`.bundle` files) 
instead of tarballs.
    *   **Benefits**: Like tarballs, they are single files that are easy to 
mirror and archive. Unlike tarballs, they are native Git objects that can 
contain the signed tag itself, preserving the full cryptographic chain without 
needing a live Git server during the build.
    > *Ref:* `<[email protected]>`

**The Reproducibility Challenge**:
*   **Simon Josefsson** pointed out a significant flaw: **reproducibility**. 
While `git archive` produces a tarball from a specific *tree* (state), a git 
bundle typically packs a range of commits. Creating a bit-identical bundle for 
an old commit is difficult if the repository has since acquired new branches or 
tags, whereas tarball generation is generally more deterministic (modulo 
compression).
    > *Ref:* `<[email protected]>`

### 1.4 Source Format "3.0 (git)"

The discussion touched on the ultimate solution to the Git-to-archive link: a 
native Git source format.

*   **Long-term Goal**: **Daniel Gröber** expressed interest in developing a 
**Source format 3.0 (git)**, which would allow the Debian archive to store Git 
trees directly rather than tarballs.
*   **Infrastructure Blockers**: Daniel noted that this work was previously 
slowed by perceived disinterest from the FTP team, but there is hope that 
recent changes in the project will allow this effort to move forward.
    > *Ref:* `<a4loydryoqagkwq6hyzrk6zflf5vttexu32axagqctvej52t2s@7q3mdhdifrm5>`

### 1.5 Trust Model: Human vs. Automated Signatures

A key architectural shift discussed was the transition of the cryptographic 
"root of trust" in the `tag2upload` model.

*   **Service Signatures**: Ian Jackson clarified that when using `tag2upload`, 
the PGP signature on the uploaded `.changes` file is produced by the 
automation, not the maintainer. This signature only attests that the service 
correctly processed the Git tag.
*   **Accountability**: To preserve the link to the human maintainer, the 
service includes the maintainer's signed Git tag metadata (info and signature) 
directly within the `.changes` file.
*   **The Audit Challenge**: Critics argued that this decoupling is a 
significant departure from Debian tradition. They expressed concern that it 
complicates auditing, as tools can no longer rely on the `.changes` signature 
alone to identify the human author, but must instead parse and verify nested 
Git signatures.
    > *Ref:* `<[email protected]>`

---

## Part II: Workflows & Tooling Conflicts

### 2.1 The `git-buildpackage` (gbp) Controversy

A technical dispute arose regarding the relationship between `git-buildpackage` 
and `dgit`, and the best practices for importing upstream sources.

**Otto** initially claimed `dgit` and `gbp` were competing systems. **Nikolaus 
Rath** and **Ian Jackson** corrected this, noting that `dgit` works *with* 
`gbp` (and even depends on it).
> *Ref:* `<[email protected]>`
> *Ref:* `<[email protected]>`

The discussion then moved to the best way to import upstream sources:

*   **`gbp import-orig`**:
    *   **Ian Jackson** argued strongly against this command because it imports 
the opaque upstream *tarball* into Git, rather than the transparent upstream 
Git history. He cited the **XZ backdoor** as a prime example, noting that 
importing the tarball relieves an attacker of the need to commit the malicious 
code to the public Git repository where it might be spotted.
        > *Ref:* `<[email protected]>`
    *   **Otto** argued that `gbp import-orig --uscan` is necessary for *new* 
upstream versions because `debian/watch` is the canonical entry point for 
Debian's infrastructure (like `vcswatch` and `debaudit`). He maintained that 
importing the tarball is actually *required* to detect if it differs from the 
Git tag (as he demonstrated with his own analysis of the XZ backdoor).
        > *Ref:* 
`<CAOU6tADq4+Eb0GkTZJhvM658ZnncM4OQdjtJh5_+qbn=gxy...@mail.gmail.com>`

*   **`gbp import-ref`**:
    *   **Gioele Barabucci** pointed out the existence of `gbp import-ref`, 
which imports a Git reference (tag or commit) instead of a tarball.
    *   Ian Jackson acknowledged this as a better potential workflow, possibly 
combined with `uscan mode=git`.

*   **The Hybrid Solution (`import-orig` with `upstream-vcs-tag`)**:
    *   **Otto** clarified that `gbp import-orig` is not just about dumping a 
tarball. By using the `--upstream-vcs-tag` option (or `upstream-vcs-tag` in 
`gbp.conf`), `gbp` creates a commit that imports the tarball but sets its 
**parent** to the upstream Git tag.
    *   **Auditability**: This links the archive artifact (the tarball) 
directly to the upstream Git history.
        *   If the tarball matches the Git tag exactly, the tree IDs match.
        *   If they differ (e.g., generated files, or a malicious injection 
like the XZ backdoor), the import commit represents **exactly the delta** 
between the upstream Git history and the released tarball, making the 
difference immediately visible to auditors.
        > *Ref:* 
`<CAOU6tADq4+Eb0GkTZJhvM658ZnncM4OQdjtJh5_+qbn=gxy...@mail.gmail.com>`

### 2.2 `pristine-tar` Support

The lack of `pristine-tar` support in `tag2upload` was a major point of 
friction, revealing a philosophical split in how Debian sources should be 
managed.

**The Issue**: Maintainers using `pristine-tar` (a tool to recreate 
bit-identical upstream tarballs from Git) cannot currently use `tag2upload`.
*   **Otto** and **Simon Josefsson** argued this breaks the workflow for 
verifying **detached upstream OpenPGP signatures**. If the tarball cannot be 
exactly reproduced, the signature provided by upstream cannot be verified, 
removing a key security check.
    > *Ref:* 
`<caou6tadegl8vz+kgiqbbucfgoztycboui7v_boqoefu_tym...@mail.gmail.com>`

**The "Hack" vs. "Truth"**:
*   **Ian Jackson** acknowledged the issue but described `pristine-tar` as a 
"hack" designed to smuggle binary artifacts (the tarball) through Git. He 
argued that the "modern" view should treat the **Git tree** itself as the 
source of truth, making the tarball an implementation detail.
    > *Ref:* `<[email protected]>`

**The Resolution**:
*   **Sean Whitton** stated the position clearly: users should either stop 
using `pristine-tar` (and trust the end-to-end Git-to-archive path provided by 
`tag2upload`) or someone needs to step up and implement the support for it, 
which has been scoped out but not done.
    > *Ref:* `<[email protected]>`

### 2.3 "Legacy Crap" vs. "Traditional Workflow"

The discussion became heated over terminology used to describe non-Git-centric 
workflows.

Ian Jackson referred to the current standard (tarball-based) approach as 
"legacy" and "crap" in the context of modern revision control.
> *Ref:* `<[email protected]>`

**Antoine Le Gonidec** took offense, stating that such language devalued the 
work of contributors using established methods and threatened to leave the 
discussion.
> *Ref:* `<[email protected]>`

**Holger Levsen** proposed using the term **"Traditional Workflow"** instead of 
"legacy" to describe `uscan` + `orig.tar.gz` based packaging.
> *Ref:* `<[email protected]>`

Ian Jackson apologized for the tone and the impact of his words.
> *Ref:* `<[email protected]>`

### 2.4 NMU Handling and Git Complexity

A practical concern was raised about how non-maintainer uploads (**NMUs**) 
interact with Git-centric workflows.

**Maintainer Diversity**:
*   **Adrian Bunk** argued that NMUs are often better handled outside of Git 
because every maintainer has different preferences for how NMU changes should 
be merged (e.g., some prefer MRs, others force-pushes). He noted that for many 
NMUs, a matching Git commit may not even exist at the time of upload.
    > *Ref:* `<aUnVXYNNLpeRFwgr@localhost>`

**Standardized History**:
*   **Ian Jackson** promoted the use of `dgit push-source` for NMUs, which 
creates a standardized Git history for the NMU regardless of the maintainer's 
internal workflow on Salsa.
    > *Ref:* `<[email protected]>`

---

## Part III: Auditability & Provenance Verification

### 3.1 Auditability and Debaudit

**Lucas Nussbaum** introduced a new tool to audit the link between Git and the 
archive.

`debaudit.debian.net` attempts to verify if the `.dsc` in the archive matches 
the upstream Git repository.
> *Ref:* `<[email protected]>`

Lucas reported that while ~75% of packages match, there is a significant 
"provenance gap" for others, including ~8% of upstream tarballs in `sid` that 
cannot be verified against upstream Git tags.
> *Ref:* `<[email protected]>`

### 3.2 Reproducibility & `git archive`

Technical discussion on the stability of `git archive` output.

**Simon Josefsson** noted that `git archive` output changes over time 
(compression, etc.), making it hard to use as a stable "pristine" source 
without signatures.
> *Ref:* `<[email protected]>`

**kpcyrd** (Arch Linux) and Simon discussed how `.gitattributes` (like 
`export-subst`) affect the generated tarball, complicating the verification 
process if Debian doesn't respect them exactly as upstream does.
    > *Ref:* `<[email protected]>`

**Common Tooling**:
*   **Simon Richter** asked if a common tool could be used by both `uscan` and 
`tag2upload` to generate orig archives from Git trees, ensuring consistency.
*   **Ian Jackson** pointed out that **`git-deborig`** already exists for this 
purpose.
    > *Ref:* `<[email protected]>`

### 3.3 Git Commit IDs in Tar Archives

The discussion highlighted a technical "middle ground": embedding Git metadata 
directly within tarball artifacts.

**Tar Pax Headers**:
*   **Lucas Nussbaum** pointed out that `git archive` (and GitHub's tag-based 
tarballs) automatically include the source Git commit ID in a **tar pax 
header**.
*   **Prevalence**: Lucas's `debaudit` research found that **35% of 
bit-identical `orig.tar` files** in Debian Unstable already contain these 
embedded commit IDs, which his tool uses for verification.
    > *Ref:* `<[email protected]>`

**Security and Audit Value**:
*   **The XZ Backdoor Case**: **Otto** used the XZ backdoor as an example of 
why these IDs are valuable. The embedded commit ID allowed auditors to identify 
the specific Git base, making the malicious changes injected into the tarball 
(but missing from Git) immediately obvious.
    > *Ref:* 
`<CAOU6tADq4+Eb0GkTZJhvM658ZnncM4OQdjtJh5_+qbn=gxy...@mail.gmail.com>`
*   **`export-subst` complications**: Otto noted that the Git `export-subst` 
attribute can populate version strings within files during export. This creates 
a legitimate reason for a tarball to differ from a raw Git tree, a nuance that 
auditors must consider.
    > *Ref:* 
`<CAOU6tAAvJx83E=de-ywjobxtjesypbr-os+kroufurdjpyr...@mail.gmail.com>`

---

## Part IV: Policy, Security & Governance

### 4.1 Security (SHA-1 vs. SHA-256)

A side discussion emerged regarding the cryptographic strength of the hash 
algorithms used by Git.

**The Concern**:
*   **Simon Josefsson** argued that relying on SHA-1 is "turning a blind eye to 
reality" given that the consensus on its insecurity dates back to 2004/2017 
(SHAttered). He suggested that Debian should be preparing for a transition to 
SHA-256.
    > *Ref:* `<[email protected]>`

**The Rebuttal**:
*   **Ian Jackson** acknowledged the long-term need to move away from SHA-1 but 
characterized the risk of *second preimage attacks* (which would be needed to 
forge a Git commit with the same hash) as "largely theoretical" right now. He 
noted that Git's hardened SHA-1 has not seen practical collisions in this 
context.
    > *Ref:* `<[email protected]>`

**Platform Support**:
*   It was noted that while platforms like **GitLab** and **Codeberg** already 
support SHA-256 Git repositories, **Salsa** (Debian's GitLab instance) may not 
yet be fully configured for it, and the transition path for existing 
repositories is complex.
    > *Ref:* `<[email protected]>`

### 4.2 Non-Free Content in Git History

A licensing and policy debate emerged regarding the presence of non-free files 
in the upstream Git history imported into Debian repositories.

**The Concern**:
*   **Simon McVittie** questioned whether maintainers could face sanctions for 
allowing non-free files (e.g., in `upstream/` branches) to exist in the 
packaging Git repository's history, even if they are excluded from the source 
package uploaded to the archive. He noted the ambiguity of "source" in a 
Git-centric world.
    > *Ref:* `<[email protected]>`

**The `dgit` Stance**:
*   **Ian Jackson** stated that, as a `dgit` delegate, their policy is that 
this is acceptable. He noted that `dgit-repos` and `salsa` have hosted such 
histories for years without issue, provided the distributed binaries do not 
rely on the non-free bits.
    > *Ref:* `<[email protected]>`

**The Counter-Point**:
*   **Ansgar** argued that if the `dgit` model treats the *entire Git 
repository* as the "source" (preferred form of modification), then distributing 
a repository with non-free history might violate the Debian Social Contract. He 
contrasted this with the traditional model where the `.dsc` defines the 
boundary of what is distributed.
    > *Ref:* `<[email protected]>`

**Proposed Solutions**:
*   **Simon Josefsson** suggested that maintainers could "prune" or rewrite 
upstream history to remove non-free blobs before importing, similar to how 
Linux-libre operates. Alternatively, `git rm` could be used to remove files 
from the current branch while acknowledging their presence in history (similar 
to `snapshot.debian.org` preserving historical non-free packages).
    > *Ref:* `<[email protected]>`

### 4.3 Governance: Policy Authorship Controversy

A heated side discussion emerged regarding the authority of Debian Policy in 
this debate.

*   **The Circular Argument**: **Otto Kekäläinen** criticized **Sean Whitton** 
for rejecting the use of `Git-Tag-*` fields on the grounds that "Policy forbids 
it," without disclosing that Sean himself had authored and committed that 
specific section of Policy very recently. Otto argued this was a form of 
circular reasoning used to shut down valid technical proposals.
    > *Ref:* 
`<CAOU6tAAvJx83E=de-ywjobxtjesypbr-os+kroufurdjpyr...@mail.gmail.com>`
*   **The Defense**: **Sean Whitton** responded that `policy.git` reflects the 
current consensus of the Policy Editors. He explained that the relevant changes 
had been committed to the repository weeks prior to the discussion, meaning 
they were already the official standard, and his publication of the new version 
was merely an administrative update.
    > *Ref:* `<[email protected]>`

---

## Part V: Technical Implementation Challenges

### 5.1 Infrastructure: Upload Queue Visibility

There was a side discussion about the interaction between upload tools 
(`dgit`/`tag2upload`) and the Debian archive infrastructure (`dak`).

**The Problem**:
*   **Ian Jackson** noted that it is currently hard for tools to know if an 
upload has been **REJECTED** or is still **QUEUED**, as the `ftp-master` API 
does not expose the state of the upload queue.
    > *Ref:* `<[email protected]>`

**FTP Master Response**:
*   **Joerg Jaspert** explained that the `queued` daemon is separate from the 
main archive database (`projectb`) and doesn't write to it, making it hard to 
expose this status via the existing API.
    > *Ref:* `<[email protected]>`

**Proposed Solution (Callbacks)**:
*   **Timo Röhling** suggested implementing a **callback mechanism**. 
Maintainers could add a field like `Upload-Receipt: https://...` to their 
`.changes` file. The archive software would then send a notification to that 
URL upon acceptance or rejection, eliminating the need for tools to poll the 
archive or guess the status.
    > *Ref:* `<[email protected]>`

### 5.2 Support for Monorepos

The technical feasibility of the Git metadata proposal for **monorepos** (where 
multiple Debian packages live in one Git repository) was briefly discussed.

*   **Tree Identification**: **Guillem Jover** noted that tools like 
`dpkg-source` might struggle to find the correct Git root in a monorepo 
containing hundreds of packages.
    > *Ref:* `<[email protected]>`
*   **Plumbing**: **Daniel Gröber** suggested that this could be addressed by 
adding options to `dpkg-source` to explicitly define the monorepo root, 
allowing these packages to still provide commit and tree ID metadata.
    > *Ref:* `<a4loydryoqagkwq6hyzrk6zflf5vttexu32axagqctvej52t2s@7q3mdhdifrm5>`

### 5.3 The Challenge of Git Submodules

Technical limitations regarding **Git submodules** were raised as a barrier to 
moving away from manual repacking.

*   **Complexity**: **Simon Richter** highlighted that some upstreams use 
nested submodules with inconsistent linking (relative vs. absolute), making it 
extremely difficult to automate their import via traditional tools like `uscan`.
    > *Ref:* `<[email protected]>`
*   **Tooling Gap**: **Ian Jackson** admitted that `tag2upload` currently lacks 
support for multiple `.orig` tarballs often required for submodule-heavy 
projects, though `dgit` provides some manual workarounds.
    > *Ref:* `<[email protected]>`

Reply via email to