Hi,

Git is a wonderful tool, which has transformed how software is created, and 
made code sharing and reuse, a lot easier (both between human and software 
tools).

Unfortunately Git is so good more and more developers start to procrastinate on 
any activity that happens outside of GIT, starting with cutting releases. The 
meme "one only needs a git commit hash" is going strong, even infecting 
institutions like lwn and glibc 
(https://lwn.net/SubscriberLink/736429/e5a8c8888cc85cc8/)

However, the properties that make a hash commit terrific at the local 
development level, also make it suboptimal as a release ID:

– hashes are not ordered. A human can not guess the sequencing of two hashes, 
nor can a tool, without access to Git history. Just try to handle "critical 
security problem in project X, introduced with version Y and fixed in Z" when 
all you have is some git hashes. hashing-only introduces severe frictions when 
analysing deployment states.

— hashes are not ranked. You can not guess, looking at a hash, if it 
corresponds to a project stability point, or is in a middle of a refactoring 
sequence, where things are expected to break. Evaluating every hash of every 
project you use quickly becomes prohibitive, with the only possible strategy 
being to just use the latest commit at a given time and pray (and if you are 
lucky never never update afterwards unless you have lots of fixing and testing 
time to waste).

– commit mixing is broken by design. One can not adapt the user of a piece of 
code to changes in this piece of code before those changes are committed in the 
first place. There will always be moments where the latest commit of a project, 
is incompatible with the latest commit of downsteam users of this project. It 
is not a problem in developer environments and automated testers, where you 
want things to break early and be fixed early. It is a huge problem when you 
follow the same early commit push strategy for actual production code, where 
failures are not just a red light in a build farm dashboard, but have 
real-world consequences. And the more interlinked git repositories you pile on 
one another, the higher the probability is two commits won't work with one 
another with failures cascading down

– commits are too granular. Even assuming one could build an automated 
regression farm powerful enough to build and test instantaneously every commit, 
it is not possible to instantaneously push those rebuilds to every instance 
where this code is deployed (even with infinite bandwidth, infinite network 
reach and infinite network availability). Computers would be spending their 
time resetting to the latest build of one component or another, with no real 
work being done. So there will always be a distance, between the latest commit 
in a git repo, and what is actually deployed. And we've seen bare hashes make 
evaluating this distance difficult

– commits are a bad inter-project synchronisation point. There are too many of 
them, they are not ranked, everyone is choosing a different commit to deploy, 
that effectively kills the network effects that helped making traditional 
releases solid (because distributors used the same release state, and could 
share feedback and audit results).

One could mitigate those problems in a Git management overlay (and, indeed, 
many try). The problem of those overlays is that they have variable maturity 
levels, make incompatible choices, cut corners, are not universal like Git, 
making building anything on top of them of dubious value, with quick fallback 
to commit hashes, which *are* universal among Git repos. Release handling and 
versioning really needs to happen in Git itself to be effective.

Please please please add release handling and versioning capabilities to Git 
itself. Without it some enthusiastic Git adopters are on a fast trajectory to 
unmanageable hash soup states, even if they are not realising it yet, because 
the deleterious side effects of giving up on releases only get clear with time.

Here is what such capabilities could look like (people on this list can 
probably invent something better, I don't care as long as something exists).

1. "release versions" are first class objects that can be attached to a commit 
(not just freestyle tags that look like versions, but may be something else 
entirely). Tools can identify release IDs reliably.

2. "release versions" have strong format constrains, that allow humans and 
tools to deduce their ordering without needing access to something else (full 
git history or project-specific conventions). The usual string of numbers 
separated by dots is probably simple and universal enough (if you start to 
allow letters people will try to use clever schemes like alpha or roman 
numerals, that break automation). There needs to be at least two numbers in the 
string to allow tracking patchlevels.

3. several such objects can be attached to a commit (a project may wish to 
promote a minor release to major one after it passes QA, versionning history 
should not be lost).

4. absent human intervention the release state of a repo is initialised at 0.0, 
for its first commit (tools can rely on at least one release existing in a 
repo).

5. a command, such as "git release", allow a human with control of the repo to 
set an explicit release version to a commit. Git enforces ordering (refuses 
versions lower than the latest repo version in git history). The most minor 
number of the explicit release is necessarily zero.

6. a command, such as "git release" without argument, allows a human to request 
setting of a minor patchlevel release version for the current commit. The 
computed version is:
   "last release version in git history except most minor number"
 + "."
 + "number of commits in history since this version"
(patchlevel versioning is predictable and decentralized, credits to Willy 
Tarreau for the idea)

7. a command, such as "git release bump", allows a human to request setting of 
a new non-patchlevel release version. The computed version is
   "last release version in git history except most minor number, incrementing 
the remaining most minor number"
 + "."
 + "0"

8. a command, such as "git release promote", allows a human to request setting 
a new more major release version. The computed version is
   "last release version in git history except most minor number, incrementing 
the next-to-remaining-most-minor-and-non-zero number, and resetting the 
remaining-most-minor-and-non-zero number to zero"
 + "."
 + "0"

9. a command, such as "git release cut", creates a release archive, named 
reponame-releaseversion.tar.xz, with a reponame-releaseversion root directory, 
a reponame-releaseversion/VERSION file containing releaseversion (so automation 
like makefiles can synchronize itself with the release version state), removing 
git metadata (.git tree) from the result. If the current commit has several 
release objects attached the highest one in ordering is chosen. If the current 
commit is lacking a release object a new minor patchlevel release version is 
autogenerated. Archive compression format can be overridden in repo config.

10. a command, such as "git release translate", outputs the commit hash 
associated to the version given in argument if it exists, the version 
associated to the commit hash given in argument if it exists, the version 
associated to the current commit without argument. If it is translating commit 
hashes with no version it outputs the various versions that could be computed 
for this hash by git release, git release bump, git release promote. This is 
necessary to bridge developer-oriented tools, that will continue to talk in 
commit hashes, and release/distribution/security-audit oriented tools, that 
want to manipulate release versions

11. when no releasing has been done in a repo for some time (I'd suggest 3 
months to balance freshness with churn, it can be user-overidable in repo 
config), git reminds its human masters at the next commit events they should 
think about stabilizing state and cutting a release.

So nothing terribly complex, just a lot a small helpers to make releasing 
easier, less tedious, and cheaper for developers, that formalize, automate, and 
make easier existing practices of mature software projects, making them 
accessible to smaller projects. They would make releasing more predictable and 
reliable for people deploying the code, and easier to consume by higher-level 
cross-project management tools. That would transform the deployment stage of 
software just like Git already transformed early code writing and autotest 
stages.

Best regards,

-- 
Nicolas Mailhot

Reply via email to