Re: [PATCH v4 00/13] Serialized Git Commit Graph

Derrick Stolee Mon, 02 Apr 2018 10:54:29 -0700

On 4/2/2018 1:35 PM, Stefan Beller wrote:

On Mon, Apr 2, 2018 at 8:02 AM, Derrick Stolee <sto...@gmail.com> wrote:

I would be happy to review any effort to extend the commit-graph
format to include such indexes, as long as the performance benefits
outweigh the complexity to create them.


[2]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.719.8396&rep=rep1&type=pdf

The complexity of calculating FELINE index is O(|V| log(|V|) + |E|), the
storage complexity is 2*|V|.

This would be very easy to add as an optional chunk, since it can use one
row per commit.

Given this discussion, I wonder if we want to include generation numbers
as a first class citizen in the current format. They could also go as
an optional
chunk and we may want to discuss further if we want generation numbers or
FELINE numbers or GRAIL or SCARAB, which are all graph related speedup
mechanism AFAICT.
In case we decide against generation numbers in the long run,
the row of mandatory generation numbers would be dead weight
that we'd need to carry.

Currently, the format includes 8 bytes to share between the generationnumber and commit date. Due to alignment concerns, we will want to keepthis as 8 bytes or truncate it to 4-bytes. Either we would be wasting atleast 3 bytes or truncating dates too much (presenting the 2038 problem[1] since dates are signed).

I only glanced at the paper, but it looks like a "more advanced 2d
generation number" that seems to be able to answer questions
that gen numbers can answer, but that paper also refers
to SCARAB as well as GRAIL as the state of the art, so maybe
there are even more papers to explore?

The biggest reason I can say to advance this series (and the smallfollow-up series that computes and consumes generation numbers) is thatgeneration numbers are _extremely simple_. You only need to know yourparents and their generation numbers to compute your own. These otherreachability indexes require examining the entire graph to create "good"index values.

The hard part about using generation numbers (or any other reachabilityindex) in Git is refactoring the revision-walk machinery to takeadvantage of them; current code requires O(reachable commits) totopo-order instead of O(commits that will be output). I think we shouldtable any discussion of these advanced indexes until that work is doneand a valuable comparison can be done. "Premature optimization is theroot of all evil" and all that.


Thanks,
-Stolee

[1] https://en.wikipedia.org/wiki/Year_2038_problem

Re: [PATCH v4 00/13] Serialized Git Commit Graph

Reply via email to