Junio C Hamano <gits...@pobox.com> writes:
> Derrick Stolee <dsto...@microsoft.com> writes:
>
>> Define compare_commits_by_gen_then_commit_date(), which uses generation
>> numbers as a primary comparison and commit date to break ties (or as a
>> comparison when both commits do not have computed generation numbers).
>>
>> Since the commit-graph file is closed under reachability, we know that
>> all commits in the file have generation at most GENERATION_NUMBER_MAX
>> which is less than GENERATION_NUMBER_INFINITY.
>
> I suspect that my puzzlement may be coming from my not "getting"
> what you meant by "closed under reachability",

It means that if commit A is in the commit graph, then all of its
ancestors (all commits reachable from A) are also in the commit graph.

>                                                but could you also
> explain how _INF and _ZERO interact with commits with normal
> generation numbers?  I've always assumed that genno will be used
> only when comparing two commits with valid genno and otherwise we'd
> fall back to the traditional date based one, but...
>
>> +int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
>> void *unused)
>> +{
>> +    const struct commit *a = a_, *b = b_;
>> +
>> +    /* newer commits first */
>> +    if (a->generation < b->generation)
>> +            return 1;
>> +    else if (a->generation > b->generation)
>> +            return -1;
>
> ... this does not check if a->generation is _ZERO or _INF.  
>
> Both being _MAX is OK (the control will fall through and use the
> dates below).  One being _MAX and the other being a normal value is
> also OK (the above comparisons will declare the commit with _MAX is
> farther than less-than-max one from a root).
>
> Or is the assumption that if one has _ZERO, that must have come from
> an ancient commit-graph file and none of the commits have anything
> but _ZERO?

There is stronger and weaker version of the negative-cut criteria based
on generation numbers.

The strong criteria:

  if A != B and gen(A) <= gen(B), then A cannot reach B

The weaker criteria:

  if gen(A) < gen(B), then A cannot reach B


Because commit-graph is closed under reachability, this means that

  if A is in commit graph, and B is outside of it, then A cannot reach B

If A is in commit graph, then either _MAX >= gen(A) >= 1,
or gen(A) == _ZERO.  Because _INFINITY > _MAX > _ZERO, then we have

  if _MAX >= gen(A) >= 1 || gen(A) == 0, and gen(B) == _INFINITY
  then A cannot reach B

which also fullfils the weaker criteria

  if gen(A) < gen(B), then A cannot reach B


If both A and B are outside commit-graph, i.e. gen(A) = gen(B) = _INFINITY,
or if both A and B have gen(A) = gen(B) = _MAX,
or if both A and B come from old commit graph with gen(A) = gen(B) =_ZERO,
then we cannot say anything about reachability... and weak criteria
also does not say anything about reachability.


Maybe the following ASCII table would make it clear.

             |                      gen(B)
             |            ................................ :::::::
gen(A)       | _INFINITY | _MAX     | larger   | smaller  | _ZERO
-------------+-----------+----------+----------+----------+--------
_INFINITY    | =         | >        | >        | >        | >
_MAX         | < Nn      | =        | >        | >        | >
larger       | < Nn      | < Nn     | = n      | >        | >
smaller      | < Nn      | < Nn     | < Nn     | = n      | >
_ZERO        | < Nn      | < Nn     | < Nn     | < Nn     | =

Here "n" denotes stronger condition, and "N" denotes weaker condition.
We have _INFINITY > _MAX > larger > smaller > _ZERO.


NOTE however that it is a *tradeoff*.  Using weaker criteria, with
strict inequality, means that we don't need to handle _INFINITY, _MAX
and _ZERO corner-cases in a special way; but it also means that we would
walk slightly more commits than if we used stronger criteria, with less
or equals.

For Linux kernel public repository commit graph[1] we have maximum of 512
commits sharing the same level, 5.43 sharing the same commit on average,
and 50% of time only 2 commits sharing the same level (median, or 2nd
quartile, or 50% percentile).  This is roughly the amount of commits we
walk more with weaker cut-off condition.

[1]: with 750k commits, but which is not largest commit graph any more :-0

>> +    /* use date as a heuristic when generations are equal */
>> +    if (a->date < b->date)
>> +            return 1;
>> +    else if (a->date > b->date)
>> +            return -1;
>> +    return 0;
>> +}

HTH
-- 
Jakub Narębski

Reply via email to