On 05/21/2018 08:10 PM, Derrick Stolee wrote:
> [...]
> In the Discussion section of the `git merge-base` docs [1], we have the
> following:
>
> When the history involves criss-cross merges, there can be more than
> one best common ancestor for two commits. For example, with this topology:
>
> ---1---o---A
> \ /
> X
> / \
> ---2---o---o---B
>
> both 1 and 2 are merge-bases of A and B. Neither one is better than
> the other (both are best merge bases). When the --all option is not
> given, it is unspecified which best one is output.
>
> This means our official documentation mentions that we do not have a
> concrete way to differentiate between these choices. This makes me think
> that this change in behavior is not a bug, but it _is_ a change in
> behavior. It's worth mentioning, but I don't think there is any value in
> making sure `git merge-base` returns the same output.
>
> Does anyone disagree? Is this something we should solidify so we always
> have a "definitive" merge-base?
> [...]
This may be beyond the scope of what you are working on, but there are
significant advantages to selecting a "best" merge base from among the
candidates. Long ago [1] I proposed that the "best" merge base is the
merge base candidate that minimizes the number of non-merge commits that
are in
git rev-list $candidate..$branch
that are already in master:
git rev-list $master
(assuming merging branch into master), which is equivalent to choosing
the merge base that minimizes
git rev-list --count $candidate..$branch
In fact, this criterion is symmetric if you exchange branch ↔ master,
which is a nice property, and indeed generalizes pretty simply to
computing the merge base of more than two commits.
In that email I also included some data showing that the "best" merge
base almost always results in either the same or a shorter diff than the
more or less arbitrary algorithm that we currently use. Sometimes the
difference in diff length is dramatic.
To me it feels like the best *deterministic* merge base would be based
on the above criterion, maybe with first-parent reachability, commit
times, and SHA-1s used (in that order) to break ties.
I don't plan to work on the implementation of this idea myself (though
we've long used a script-based implementation of this algorithm
internally at GitHub).
Michael
[1] https://public-inbox.org/git/[email protected]/
See the rest of the thread for more interesting discussion.
[2]
https://public-inbox.org/git/[email protected]/
Higher in this thread, Junio proposes a different criterion.