Russ Allbery writes ("Re: [RFC] General Resolution to deploy tag2upload"):
> So yes, you're right, the git-debrebase example is not nearly as
> interesting as I had thought because the tooling works differently than I
> had realized.

As ever, it's all more complicated than you thought (and than you now
think).  I'm going to give just a few examples of the frantic paddling
that dgit is doing underneath the waterline.  This is therefore an
*extremely* long message.

First, though, I want to summarise:

 In this message I discuss in some detail five packaging workflows.

 For the three current workflows I discuss their workings in some
 detail; I explain some of the wrinkles, anomalies and complications
 that dgit currently deals with, and that tag2upload takes care of.
 For the two future workflows - one near future, and one speculative -
 I sketch out what the support might look like.

 I also discuss my understanding of the alternative design proposed by
 some ftpmasters.  In each case, the tag2upload design handles the
 situation well.  In each case, the alternative design works
 significantly less well, or requires significantly more complexity in
 more places - usually both.  In some cases the alternative design
 can't sensibly work at all.

 I want to emphasise that these are *examples*.  I feel we have
 spent much of this thread (and much of previous conversations)
 playing whack-a-mole with "but you could fix that anomaly by doing
 X" and "you could handle that other sutation by doing Y".  Where "X
 and "Y" are each not great, but perhaps might be tolerable, if they
 were the only limitation.

 So, yes, it is true, that in *some* of these cases, including
 perhaps many actual packages in practice, the alternativew design
 could be made to work.  But the alternative design does *not* solve
 all the problems that tag2upload does, and the problems that it
 does solve it handles in a more complicated and ugly way, with more
 limitations.

 Taking this all together, the alternative proposal is sufficiently
 limited in scope, and poor in its outcomes, that it's not worth
 pursuing.


Right then.


1. git-debrebase:

Firstly, this is one of the easier cases from tag2upload's point of
view.  git-debrebase is modern and git-based, so has fewer warts.

It's true that git-debrebase can make patches.

But, the calls to git-debrebase that you make as a maintainer do not
make any patches in debian/patches.  Indeed, usually, if git-debrebase
finds anything in debian/patches, it simply deletes it all.

What happens is that dgit has special knowledge about git-debrebase:
it knows that git-debrebase can make patches.  (This is actually there
as an optimisation: git-debrebase can make patches much faster.)

When you do `dgit push-source` (which is how git-debrebase users
upload), dgit knows it needs to maybe make patches, because that's how
a "3.0 (quilt)" source package works.  This is the "quilt-fixup" step
of uploading, which is what (for historical reasons) the source
package canonicalisation is called.

So iff you are using git-debrebase with "3.0 (quilt)", dgit uses
git-debrebase to make the patches and commit them to your branch.

However, you can also run `dgit push-source --split-view=always`.
This is an alternative workflow.  In that case, the synthetic git
commits which introduce d/patches don't end up in your own maintainer
git branch.  (I'm not sure Russ knows this feature exists.)  This mode
is nicer because you don't get diff noise about changes to the
completely autogenerated contents of d/patches.  Specifically, without
the split view, each upload introduces a bunch of patches onto the
maintainer branch, which the next run of git-debrebase after the
upload immediately deletes.

So in that case the maintainer branch never has patches and isn't
treesame to a "3.0 (quilt)" source package.

Also!  You can use git-debrebase with 1.0-with-diff, or with 1.0
native.  (I'm not sure Russ knows this, either.)  This is often a nice
way of working, for a small package which usually has an empty or tiny
patch queue.  If you do that then there are no patches, ever, just git
commits and an output tarball.  And, there's a wrinkle: you can't use
git-debrebase with "3.0 (native)" because of a bug in dpkg-source [1].

So whether there are patches depends on the maintainer workflow, the
intended source package format, and the surrounding context (eg
sponsorship), and they are made by dgit, which calls out to
git-debrebase as an optimisation.


Relationship to tag2upload:

git-debpush and the tag2upload tag don't know anything about any of
this chaos.  git-debpush simply signs a tag saying "this git branch is
in a format suitable for quilt fixup in linear patches mode".

git-debpush has *no* code to deal with any of the above.  All of this
is left to the tag2upload service.

With a git-based sponsorthip workflow, the sponsor may not need to
learn git-debrebase.  They can review the git *tree*, diffing it
against the upstream (ideally, upstream's signed tag), and likewise
they can diff it against the previous upload.  They'll declare the
nicely predictable "linear" workflow mode in their tag.  They can be
sure that the output source package will be precisely the code they've
reviewed git.

(git-debpush does have one piece of git-debrebase-specific knowledge -
an overrideable sanity check to guard against a user error causing an
anonalous branch state.  It's 9 lines of code - and nothing to do
with source pacakge construction or package contents.  This sanity
check is not an essential part of git-debpush, and another tag
generation utility, or a human, could omit it.)


ftpmaster's alternative design, AIUI:

(Here I'm going to compare tag2upload with the alternative design
where the uploader signature covers a manifest of all the files in the
unpacked source package - ie, of the result of dpkg-source -x.  The
ftpmasters haven't produced a complete design, but I think I can infer
the properties that a full proposal would have.)

In this alternative design, software making an upload intent tag for a
git-debrebase package would need code to generate the contents of
debian/patches.  Realistically, that means it needs a copy of
git-debrebase.

And, the person authorising the upload now needs to to learn about and
run and trust git-debrebase, which in our design they often didn't.


2. linear quilt mode, especially with NMUs

I'm going to explain this in terms of git-based NMUs.  Similar
situations can arise in other situations, including certain (I think
not widely used) maintainer workflows.

When doing an NMU with git, you first obtain a suitable
patches-applied git branch from somewhere.  (Currently `dgit clone` is
the best way to do that, but tag2upload will open up the
possibility[2] of making it be just a `git clone` in the future.)

You then make commit(s) representing your changes, and test them.
(NB that testing them doesn't necessarily involve making a
"3.0 (quilt)" source package.  You can build binaries from git.)

When you're happy, you file the NMUdiff bug report (you can use
git-format-patch or git-diff for this), and you `dgit push-source`.
Note that at no point have you done anything with d/patches.

So at this stage, your git working tree has some applied patches in
d/patches, plus also some changes that are only in git commits.

dgit knows how to figure out *which* git commits need making into
patches, which is a nontrivial problem.  The basic algorithm is to
calculate what the tree looks like if you take the orig tarball and
apply the contents of debian/patches - that gives dgit the tree at the
last upload.  Then dgit walks backwards through the git history hoping
to find a commit whose tree matches that last upload.  Then it can
walk forward again and make patches out of the commits.

There's more.  dgit wants to make patches that the NMU recipient won't
object to.  So, we can't just use gbp pq because some maintainers
don't like its output and want the patches in closer to DEP-3 format.
Therefore, dgit makes these patches by calling `dpkg-source --commit`
with a stunt value of `EDITOR`.

Again, all of this is only necessary with "3.0 (quilt)".  It also
depends on the archive contents - it's important to be using the orig
tarball from the archive.

Finally, did you know that dpkg-source and git can disagree about the
meaning of patches?  There are patches that dpkg-source can apply, but
which git fails on.  There are also patches that they *both* apply,
but *disagree* about the meaning of!  Real packages, including highly
important core packges, are sometimes afflicted.  dgit has code in it
to deal with that too.


Relationship to tag2upload:

Once again, git-debpush and the tag2upload tag don't know anything
about any of this chaos.  git-debpush simply signs a tag saying "this
git branch is in a format suitable for quilt fixup in linear patches
mode".


ftpmaster's alternative design, AIUI:

In the alternative design it is probably not feasible to support NMUs
of arbitrary "3.0 (quilt)" packages.

Likewise maintainer workflows that rely on dgit's sophisticated git to
quilt linearisation algorithm are also not supportable.


3. gbp

git-buildpackage and gbp pq, and its patches-unapplied branch format,
are probably the most common workflow in Debian right now.

With gbp pq, the maintainer's DEP-14 tag (the tag2upload tag) is on
that unapplied branch.  With a "3.0 (quilt)" source package, it is not
actualliy strictly necessary to apply the patches to make the source
package, since the applied form of the files is not directly
represeented.  Instead, dpkg-source applies the patches on extraction.

But there is a wrinkle.  gbp inherits a bug in dpkg-source[4]: if the
maintainer has edited the upstream .gitignore, in their git
representation, this is *not* represented in the source package
generated by git-buildpackage.  IMO this is a clear DFSG violation[5].

If the maintainer uses `dgit push-source --quilt=gbp`, dgit will spot
this situation and make an additional patch in debian/patches,
representing the maintainer's edits to .gitignore.  That patch appears
only in the canonical git branch and the source package, not in the
maintainer's view of debian/patches.


How does this relate to tag2upload?

The tag2upload git tag does not contain any detailed information about
any of this.  It simply specifies that the quilt mode `gbp` should be
used.  The tag2upload server does all the work.

(git-debpush *does* contain an overrideable sanity check that upstream
files match and the patches apply.  Again, this is not an essential
part of its functionality and another signing tool wouldn't need it.)


ftpmaster's alternative design, AIUI:

The alternative design I've been positing supposes including a
manifest of the contents of the unpacked source package.  Ie, patches
applied.

In that alternative design, any utility which wanted to make an upload
intent tag would need to be able to apply the patches.  The patch
application code becomes an essential part of the tag generation
software.

Also, the tag generation utility would need to have special knowledge
about .gitignore.  There are two options here: (1) have code to find
the upstream .gitignores, compare them with the maintainer's
.gitignores, and generate a synthetic patch.  Or, (2) find the
upstream .gitignores and arrange to include the hashes of the upstream
.gitignores rather than the maintainer's .gitignores in the manifest
(which IMO violates the DFSG [5]).  In either case the tag generation
utility needs special knowledge about gbp's .gitignore behaviour.  Or
of course we could: (3) don't let maintainers edit or add .gitignore
in the upstream part of the package.


4. git-debcherry

git-debcherry is an interesting git patch workflow utility.  It is not
currently supported by dgit, but that's not because it's impossible,
or even particularly difficult.  We just haven't got around to it. [6]

I don't fully understand git-debcherry, but AIUI the basic principle
is that it is a tool for constructing debian/patches based on a
patches-applied maintainer branch.  It has an interesting algorithm
with some nice properties, including that it doesn't constrain the
maintainer git branch structure.

Only git-debcherry knows what patches it's going to produce, and
it takes the orig tarball as an input.

Support in dgit would be to have dgit call git-debcherry at an
appropriate point in the source package construction (during what dgit
calls "quilt fixup").


Relationship to tag2upload:

tag2upload doesn't support this yet, but it could do.  We would add
the support in dgit, and when that was deployed to the tag2upload
server, git-debcherry would be useable with tag2upload right away.

As with the other workflows, git-debpush wouldn't need any code
specific to git-debcherrry.  Like the other patches-applied workflows,
the authorising uploader (eg, a sponsor) does not need to understand,
or run, git-debcherry.


ftpmaster's alternative design, AIUI:

git-debcherry uses the orig tarball, so it couldn't be supported,
since the uploading developer doesn't have any tarballs.

It might be supportable if we also made changes to git-debcherry, to
allow it to work off an upstream git tag instead.


5. language team monorepos

Several teams handling upstream language-specific package managers
have a monorepo on salsa containing metadata and patches.  I'm aware
of at least Rust and Haskell working this way.  The precise contents
of the monorepo vary, and each team has team-specific tooling.

The fragmentation is a problem, and the workflows can be very awkward.
Typically .dscs are constructed on maintainer laptops using
team-specific tooling, taking both the team monorepo and upstream
artifacts as inputs.

None of these are supported by `dgit push-source` right now.  It would
be nice to be able to improve this, by formalising and streamlining
the conversion process including source package construction.  I think
that would be possible in principle, but the design space is large and
as far as I'm aware there hasn't been any serious conversations,
involving both source handling experts (like the dgit team) and
multiple monorepo packaging teams, about common aspects of their
workflows, differing requirements, etc.

(I should say that at least for Rust, which I know very well, I have
serious doubts as to whether the monorepo is the right approach, but
that's a whole other can of worms.)


Relationship to tag2upload:

If we deploy tag2upload, we'll be greatly streamlining the usual
uplaod case.  This will increase the gap between the existing monorepo
workflows on the one hand, and the majority of packages (which are
supported by tag2upload) on the other hand.

The potential gains from improving the monorepo workflows will be
bigger, and also more evident to a wider set of people.

In summary, supporting monorepo team(s) with more-git-based workflows
is probably possible, in the medium to long term.  I think it's likely
to happen with tag2upload.


ftpmaster's alternative design, AIUI:

I think the alternative design couldn't ever handle multi-package
monorepos in the style of the Rust or Haskell teams.



Ian.


Footnotes.


[1] dpkg-source hates "3.0 (native)" with non-native version,
despite TC request to please allow it:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737634#107

[2] To support NMUs based on just "git clone" we'd need to start
importing every non-git-based[3] .dsc into git, which isn't a sensible
thing to do until the git repository and everything is scaled up due
to git-based .dscs being more common, which will be an effect of
tag2upload.

[3] By "git-based" I mean that the .dsc tells you which git commit it
was made from, and the git tags etc. tell you how.  I don't mean to
include ad-hoc source package construction from untraceable git trees
using untrackedd software on maintainer laptops.

[4] The dpkg-source bug about the .gitignore DFSG violation:
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908747

[5] Reading the bug report[4] it's clear that not everyone agrees that
discarding our .gitignore changes is a DFSG violation.  I find that
position quite implausible but I'm hoping we don't need to resolve it
here.

[6] dgit feature request ticket "want dgit --quilt=debcherry"
 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=930881
 

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to