On 4/30/2018 7:32 PM, Jakub Narebski wrote:
Derrick Stolee <dsto...@microsoft.com> writes:

We now calculate generation numbers in the commit-graph file and use
them in paint_down_to_common().

Expand the section on generation numbers to discuss how the three
special generation numbers GENERATION_NUMBER_INFINITY, _ZERO, and
_MAX interact with other generation numbers.

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
Looks good.

---
  Documentation/technical/commit-graph.txt | 30 +++++++++++++++++++-----
  1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index 0550c6d0dc..d9f2713efa 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -77,6 +77,29 @@ in the commit graph. We can treat these commits as having 
"infinite"
  generation number and walk until reaching commits with known generation
  number.
+We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+    If A and B are commits with generation numbers N amd M, respectively,
+    and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits,
The linux kernel commit graph has maximum of 513 commits sharing the
same generation number, but is is 5.43 commits sharing the same
generation number on average, with standard deviation 10.70; median is
even lower: it is 2, with 5.35 median absolute deviation (MAD).

So on average it would be a few extra commits.  Right.

                               but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
As I wrote before, handling those corner cases in more complicated, but
not that complicated.  We could simply use stronger condition if both
generation numbers are ordinary generation numbers, and weaker condition
when at least one generation number has one of those special values.

+
+We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
+generation numbers are computed to be at least this value. We limit at
+this value since it is the largest value that can be stored in the
+commit-graph file using the 30 bits available to generation numbers. This
+presents another case where a commit can have generation number equal to
+that of a parent.
Ordinary generation numbers, where stronger condition holds, are those
between GENERATION_NUMBER_ZERO < gen(C) < GENERATION_NUMBER_MAX.

+
  Design Details
  --------------
@@ -98,17 +121,12 @@ Future Work
  - The 'commit-graph' subcommand does not have a "verify" mode that is
    necessary for integration with fsck.
-- The file format includes room for precomputed generation numbers. These
-  are not currently computed, so all generation numbers will be marked as
-  0 (or "uncomputed"). A later patch will include this calculation.
-
Good.

  - After computing and storing generation numbers, we must make graph
    walks aware of generation numbers to gain the performance benefits they
    enable. This will mostly be accomplished by swapping a commit-date-ordered
    priority queue with one ordered by generation number. The following
-  operations are important candidates:
+  operation is an important candidate:
- - paint_down_to_common()
      - 'log --topo-order'
Another possible candidates:

        - remove_redundant() - see comment in previous patch
        - still_interesting() - where Git uses date slop to stop walking
          too far

remove_redundant() will be included in v5, thanks.

Instead of "still_interesting()" I'll add "git tag --merged" as the candidate to consider, as discussed in [1].

[1] https://public-inbox.org/git/87fu3g67ry....@lant.ki.iif.hu/t/#u
    "branch --contains / tag --merged inconsistency"


- Currently, parse_commit_gently() requires filling in the root tree
One important issue left is handling features that change view of
project history, and their interaction with commit-graph feature.

What would happen, if we turn on commit-graph feature, generate commit
graph file, and then:

   * use graft file or remove graft entries to cut history, or remove cut
     or join two [independent] histories.
   * use git-replace mechanims to do the same
   * in shallow clone, deepen or shorten the clone

What would happen if without re-generating commit-graph file (assuming
tha Git wouldn't do it for us), we run some feature that makes use of
commit-graph data:

   - git branch --contains
   - git tag --contains
   - git rev-list A..B


The commit-graph is not supported in these scenarios (yet). grafts are specifically mentioned in the future work section.

I'm not particularly interested in supporting these features, so they are good venues for other contributors to get involved in the commit-graph feature. Eventually, they will be blockers to making the commit-graph feature a "default" feature. That is when I will pay attention to these situations. For now, a user must opt-in to having a commit-graph file (and that same user has possibly opted in to these history modifying features).

Thanks,
-Stolee

Reply via email to