Hello fellow Gentoo developers and subscribers of the gentoo-dev mailing list,

I've been wanting to write this email for a while but for some reason never got
round to doing it due to lack of motivation and time. 

I will be discussing many topics in this email revolving around git
essentially. I first want to go over some basic concepts about git and GitHub
and why we should be doing things differently if we want to avoid cluttering up
our repository with useless stuff.

* Background

As you know, a little while ago we've migrated the main tree to git, the
revision control tool which needs no introduction. A few months after the
migration, our repository was mirrored over to GitHub to give the project a bit
more exposure to what some developers refers to as "the GitHub generation". The
response from the community was extordinary and as a result, a massive number
of Pull Requests came our way. We soon then started to lend ourselves to the
duty of PR triaging and merging, and started to make what's called in the git
world "merge commits". Understanding merge commits requires understanding how
GitHub considers a contribution. 

When a contributor sends a PR via GitHub, he will essentially be making a
different branch, start working on it and eventually file it. For those of you
familiar with git or who've already filed PRs on GitHub, this is old news.
However, there's a number of different way to deal with PRs on the receiving
end (us) in order to keep a sane log history (graph actually).

When we first started working with git, and GitHub, the tendency was to rely on
merge commits to merge contributions back into the main repo. In my opinion,
this was, and still is, a bad idea. What's so special about merge commits? 

* A short walk through merge commits

As you may know, merging one branch into another often results in creating a
new commit. This commit is called a "merge commit" in git jargon. Let's pick
for instance cf4cce36684de5e449ec60bde3421fa0e27bac74. I'm not trying to put
the blame on a particular developer, we've all used merge commits at one point
or another and I was one of the first! In the log graph, this commit is
displayed as such:

$ git log --graph --oneline master
[snip]
* | |   cf4cce3 Merge remote-tracking branch 'github/pr/1845'
|\ \ \
| * | | abf61de net-im/ejabberd: require <dev-lang/erlang-19
* | | | 72c688f app-cdr/xcdroast: remove old revisions
* | | | ced099c package.mask: update xcdroast p.mask
[snip]

The problem here is two fold. First off, we've created a commit which is
pretty much meaningless. Merge commits often tell a story which says nothing
interesting: Merge remote-tracking branch 'github/pr/1845'. OK, that's great
but we care? Not really. 

The second problem stems from the very nature of merge commits. Indeed, the
first parent of a merge commit is the tip of master right when the branch is
created, in our case this is when the contributor created his branch and
started working on his contribution. However, git log also displays on the left
hand side what I shall call "rails" (no, I'm not a Ruby developer). A rail is
essentially a path leading back to the parent of a merge commit. It is a meant
to be a visual aid to help you work out when two branches veered off and
enventually got merged back together. As you might have noticed by running git
log yourself in the Gentoo git repo and looking back 6 months or a year ago,
there are rails all over the place and overlapping each others. Why does it
happen? 

As I just explained above, the parent of a merge commit is the tip of master.
But because PRs i.e. branches are each created at a different time, the tip of
master is different for each of them. When merging by using a merge commit, git
tries really hard to put this information back together by working out the
parent of each merge commit. This results in a gigantic and entangled mess
shown by git log. I often joke that it looks like as messy version of the
London Tube map: colourful yet upside down.

In some open source projects, it makes sense to leverage merge commits. The
Linux kernel comes to mind for instance. In this case, merge commits are a good
way to track changes coming from a different branch. Given the sheer amount on
contributors working on the Linux kernel, this is useful information for
someone new willing to tackle a new area of the kernel. Figuring out changes
made to a file across several releases is extremely helpful and merge commits
definitely fill this gap. Also, the Linux kernel doesn't have to deal with PRs
since diffs are sent directly to a mailing list.

In the case of Gentoo though, it makes no sense. We should strive for keeping a
clean and linear history. I have yet to witness developers creating branches
in the Gentoo main repository. Even though the GitHub model considers PRs as
branches, they are in fact casual contributions and should be treated as such.

By avoiding merge commits, we make sure the history stay linear with no
parent/child commits all over the place. It leads us to the two remaining
solutions for dealing with PRs in a clean fashion: cherry-picking and git am.
These two solutions really shine at keeping a sane history.

Cherry-picking is not my go-to solution as far as I'm concerned. It requires a
bit of setup and is clearly tedious: you must know in advance the full SHA-1 of
commit(s) you want to cherry-pick. You must also set up remote repositories,
pull from them every now and then, etc. For a Git newbie, it can be daunting. A
few developers often opt for this solution (hi kensington!) which I do not
vouch for.

Eventually, we're left with git am. My favourite choice if you ask me, since it
requires very little to do compared to cherry-picking or making merge commits.
You may or may not know about it but a PR can be fetched as a git am-compatible
patch. If you've ever read emails sent by the GitHub bots, they point to this
URL:

https://github.com/gentoo/gentoo/pull/1234.patch

Once fetched, using your favourite web crawler, the patch can be directly
applied via the git am command onto HEAD of the repository you're dealing with.
There's this common idiom for fetching AND applying at patch all at once:

$ curl https://github.com/gentoo/gentoo/pull/1234.patch | git am

* This is where I'm meant to sell you my solution

Ultimately, I've decided to write a tool to leverage this way of fetching PRs
and merging them. The tool is called Gentoo-App-Pram and is available in the
tree:

# emerge Gentoo-App-Pram

It is written in Perl, works fairly well and has been used by a fair (growing?)
number of developers so far. The tool is CLI-based so you will need to feel at
home with the command line.

Once emerged, cd into your Gentoo git repo and type `pram' followed by the PR
number you wish to merge:

$ cd /home/patrice/gentoo
$ pram 1234

pram will then fetch the PR as a patch and display it to you in your
favourite $EDITOR. At this point, you can make any change to the PR i.e.
editing commit message(s), changing code in-line, etc. 

pram also leverages the "Closes:" header. This header is recognised by GitHub,
and Larry the Cow, and will automatically close a PR when parsing it in the
body of a commit message. So for instance, the following header will
automatically close PR 1234: "Closes: 
https://github.com/gentoo/gentoo/pull/1234";.
You don't need to manually add it as pram will do this for you.

After saving and getting out of $EDITOR, pram will ask you whether the PR needs
merging by asking a yes/no question. "y" will launch git am and merge the
patch whereas "n" will abort the operation and clean it up.

That's pretty much it. Make sure to read the man page since there are other
options available (pram --man).

pram wouldn't have been possible without Kent Fredric's help. He's assisted me
in releasing the package on CPAN and contributed a few patches. Kudos to him!

To wrap up:
- Please stop making merge commits. This strategy is not useful in the case of
  Gentoo and does more harm than good.
- Cherry-pick or git-am external contributions such as PRs.
- Better yet, use Gentoo-App-Pram. :-)

If you want to contribute to Gentoo-App-Pram, send me a PR on GitHub at
https://github.com/monsieurp/Gentoo-App-Pram or file a bug report at
https://bugs.gentoo.org and assign it to me.

Comments and suggestions welcome.

Cheers,

-- 
Patrice Clement
Gentoo Linux developer
http://www.gentoo.org

Attachment: signature.asc
Description: PGP signature

Reply via email to