On Fri, Oct 20, 2017 at 3:40 AM,  <nicolas.mail...@laposte.net> wrote:
> Hi,
>
> Git is a wonderful tool, which has transformed how software is created, and 
> made code sharing and reuse, a lot easier (both between human and software 
> tools).
>
> Unfortunately Git is so good more and more developers start to procrastinate 
> on any activity that happens outside of GIT, starting with cutting releases. 
> The meme "one only needs a git commit hash" is going strong, even infecting 
> institutions like lwn and glibc 
> (https://lwn.net/SubscriberLink/736429/e5a8c8888cc85cc8/)

For release you would want to include more than just "the code" into
the hash, such as compiler versions, environment variables, the phase
of the moon, what have you, that may impact the release build.

> However, the properties that make a hash commit terrific at the local 
> development level, also make it suboptimal as a release ID:
>
> – hashes are not ordered. A human can not guess the sequencing of two hashes, 
> nor can a tool, without access to Git history. Just try to handle "critical 
> security problem in project X, introduced with version Y and fixed in Z" when 
> all you have is some git hashes. hashing-only introduces severe frictions 
> when analysing deployment states.

It sounds to me as if you assume that if X, Y, Z were numbers (or
rather had some order), this can be easily deduced.
The output of git-describe ought to be sufficient for an ordering
scheme to rely on?
However the problem with deployments is that Y might be v1.8.0.1 and Z
might be v2.1.2.0 and X (that you are running) is v2.10.2.0.

> — hashes are not ranked. You can not guess, looking at a hash, if it 
> corresponds to a project stability point, or is in a middle of a refactoring 
> sequence, where things are expected to break. Evaluating every hash of every 
> project you use quickly becomes prohibitive, with the only possible strategy 
> being to just use the latest commit at a given time and pray (and if you are 
> lucky never never update afterwards unless you have lots of fixing and 
> testing time to waste).

That is up to the hash function. One could imagine a hash function
that generates bit patterns that you can use to obtain an order from.
SHA-1 that Git uses is not such a hash, but rather a supposedly secure
hash. One hash value looks like white noise, such that the entropy of
a SHA-1 object name can be estimated with 160 bits.

> – commit mixing is broken by design.

In Git terms a repository is the whole universe.
If you want relationships between different projects, you need to
include these projects e.g. via subtree or submodules.
It scales even up to linux distributions (e.g.
https://github.com/gittup/gittup, includes nethack!)

> One can not adapt the user of a piece of code to changes in this piece of 
> code before those changes are committed in the first place. There will always 
> be moments where the latest commit of a project, is incompatible with the 
> latest commit of downsteam users of this project. It is not a problem in 
> developer environments and automated testers, where you want things to break 
> early and be fixed early. It is a huge problem when you follow the same early 
> commit push strategy for actual production code, where failures are not just 
> a red light in a build farm dashboard, but have real-world consequences. And 
> the more interlinked git repositories you pile on one another, the higher the 
> probability is two commits won't work with one another with failures 
> cascading down

That is software engineering in general, I am not sure how Git relates
to this? Any change that you make (with or without utilizing Git) can
break the downstream world.

> – commits are too granular. Even assuming one could build an automated 
> regression farm powerful enough to build and test instantaneously every 
> commit, it is not possible to instantaneously push those rebuilds to every 
> instance where this code is deployed (even with infinite bandwidth, infinite 
> network reach and infinite network availability).

With infinite resources it would be possible, as the computers are
also infinitely fast. ;)

> Computers would be spending their time resetting to the latest build of one 
> component or another, with no real work being done. So there will always be a 
> distance, between the latest commit in a git repo, and what is actually 
> deployed. And we've seen bare hashes make evaluating this distance difficult
>
> – commits are a bad inter-project synchronisation point. There are too many 
> of them, they are not ranked, everyone is choosing a different commit to 
> deploy, that effectively kills the network effects that helped making 
> traditional releases solid (because distributors used the same release state, 
> and could share feedback and audit results).

There are different strategies. Relevant open source projects (kernel,
glibc, git) are pretty good at not breaking the downstream users with
every commit. So "no matter which version of X you use, it ought to
work fine".
If you want faster velocity, you have to couple the projects more
(submodules or a large repo including everything)

> One could mitigate those problems in a Git management overlay (and, indeed, 
> many try). The problem of those overlays is that they have variable maturity 
> levels, make incompatible choices, cut corners, are not universal like Git, 
> making building anything on top of them of dubious value, with quick fallback 
> to commit hashes, which *are* universal among Git repos. Release handling and 
> versioning really needs to happen in Git itself to be effective.

I am not convinced, yet. As said initially the release handling needs
to take more things into account (compiler version, hardware version
of the fleet, etc) which is usually not tracked in Git. Well you
could, but that is the job of the release management tool, no?

> Please please please add release handling and versioning capabilities to Git 
> itself. Without it some enthusiastic Git adopters are on a fast trajectory to 
> unmanageable hash soup states, even if they are not realising it yet, because 
> the deleterious side effects of giving up on releases only get clear with 
> time.
>
> Here is what such capabilities could look like (people on this list can 
> probably invent something better, I don't care as long as something exists).
>
> 1. "release versions" are first class objects that can be attached to a 
> commit (not just freestyle tags that look like versions, but may be something 
> else entirely). Tools can identify release IDs reliably.

git tags ?

> 2. "release versions" have strong format constrains, that allow humans and 
> tools to deduce their ordering without needing access to something else (full 
> git history or project-specific conventions). The usual string of numbers 
> separated by dots is probably simple and universal enough (if you start to 
> allow letters people will try to use clever schemes like alpha or roman 
> numerals, that break automation). There needs to be at least two numbers in 
> the string to allow tracking patchlevels.

git tags are pretty open ended in their naming. the strictness would
need to be enforced by the given requirement of the environment. (Some
want to have just one integer number going up; others want patch
levels, i.e. 4 ints; yet others want dates?)

> 3. several such objects can be attached to a commit (a project may wish to 
> promote a minor release to major one after it passes QA, versionning history 
> should not be lost).

Multiple git tags can be attached to the same commit. You can even tag
a tag or tag a blob.


> 4. absent human intervention the release state of a repo is initialised at 
> 0.0, for its first commit (tools can rely on at least one release existing in 
> a repo).

An initial repo doesn't have tags, which comes close to 0.

>
> 5. a command, such as "git release", allow a human with control of the repo 
> to set an explicit release version to a commit. Git enforces ordering 
> (refuses versions lower than the latest repo version in git history). The 
> most minor number of the explicit release is necessarily zero.
>
> 6. a command, such as "git release" without argument, allows a human to 
> request setting of a minor patchlevel release version for the current commit. 
> The computed version is:
>    "last release version in git history except most minor number"
>  + "."
>  + "number of commits in history since this version"
> (patchlevel versioning is predictable and decentralized, credits to Willy 
> Tarreau for the idea)
>
> 7. a command, such as "git release bump", allows a human to request setting 
> of a new non-patchlevel release version. The computed version is
>    "last release version in git history except most minor number, 
> incrementing the remaining most minor number"
>  + "."
>  + "0"
>
> 8. a command, such as "git release promote", allows a human to request 
> setting a new more major release version. The computed version is
>    "last release version in git history except most minor number, 
> incrementing the next-to-remaining-most-minor-and-non-zero number, and 
> resetting the remaining-most-minor-and-non-zero number to zero"
>  + "."
>  + "0"

This sounds fairly specific to an environment that you are in, maybe
write git-release for your environment and then open source it. The
world will love it (assuming they have the same environment and
needs).


> 9. a command, such as "git release cut", creates a release archive, named 
> reponame-releaseversion.tar.xz, with a reponame-releaseversion root 
> directory, a reponame-releaseversion/VERSION file containing releaseversion 
> (so automation like makefiles can synchronize itself with the release version 
> state), removing git metadata (.git tree) from the result. If the current 
> commit has several release objects attached the highest one in ordering is 
> chosen. If the current commit is lacking a release object a new minor 
> patchlevel release version is autogenerated. Archive compression format can 
> be overridden in repo config.

git -archive comes to mind, doing a subset here.

> 10. a command, such as "git release translate", outputs the commit hash 
> associated to the version given in argument if it exists, the version 
> associated to the commit hash given in argument if it exists, the version 
> associated to the current commit without argument. If it is translating 
> commit hashes with no version it outputs the various versions that could be 
> computed for this hash by git release, git release bump, git release promote. 
> This is necessary to bridge developer-oriented tools, that will continue to 
> talk in commit hashes, and release/distribution/security-audit oriented 
> tools, that want to manipulate release versions
>
> 11. when no releasing has been done in a repo for some time (I'd suggest 3 
> months to balance freshness with churn, it can be user-overidable in repo 
> config), git reminds its human masters at the next commit events they should 
> think about stabilizing state and cutting a release.

This is all process specific to your environment. Consider e.g. the
C++ standard committee tracking the C++ Standard in Git .
https://isocpp.org/std/the-committee
They make a release every 10 years or such, so 3 month is off!  Other
examples could be made for 3 month to be way too long.

> So nothing terribly complex, just a lot a small helpers to make releasing 
> easier, less tedious, and cheaper for developers, that formalize, automate, 
> and make easier existing practices of mature software projects, making them 
> accessible to smaller projects. They would make releasing more predictable 
> and reliable for people deploying the code, and easier to consume by 
> higher-level cross-project management tools. That would transform the 
> deployment stage of software just like Git already transformed early code 
> writing and autotest stages.

Integrating with CI and release is definitely important, but Git
itself has no idea about the requirements and environments of the
project specifics, hence it is not developed a lot on that front.
For example the contribution process to git (as well as Linux) is done
via email and partially pull requests. These workflows have pretty
good integration in Git via "format-patch" and "request-pull". There
may be other workflows that are not as nicely integrated as they did
not come to mind to the Git developers.

Thanks,
Stefan


>
> Best regards,
>
> --
> Nicolas Mailhot
>

Reply via email to