Re: [PATCH 0/6] Compute and consume generation numbers

Derrick Stolee Wed, 11 Apr 2018 12:59:33 -0700

On 4/11/2018 3:32 PM, Jakub Narebski wrote:

What would you suggest as a good test that could imply performance? The
Google Colab notebook linked to above includes a function to count
number of commits (nodes / vertices in the commit graph) walked,
currently in the worst case scenario.


The two main questions to consider are:

1. Can X reach Y?
2. What is the set of merge-bases between X and Y?

And the thing to measure is a commit count. If possible, it would begood to count commits walked (commits whose parent list is enumerated)and commits inspected (commits that were listed as a parent of somewalked commit). Walked commits require a commit parse -- albeit from thecommit-graph instead of the ODB now -- while inspected commits onlycheck the in-memory cache.

For git.git and Linux, I like to use the release tags as tests. Theyprovide a realistic view of the linear history, and maintenance releaseshave their own history from the major releases.

I have tried finding number of false positives for level (generation
number) filter and for FELINE index, and number of false negatives for
min-post intervals in the spanning tree (for DFS tree) for 10000
randomly selected pairs of commits... but I don't think this is a good
benchmark.

What is a false-positive? A case where gen(X) < gen(Y) but Y cannotreach X? I do not think that is a great benchmark, but I guess it issomething to measure.

I Linux kernel sources 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
that has 750832 nodes and 811733 edges, and 563747941392 possible
directed pairs, we have for 10000 randomly selected pairs of commits:

   level-filter has    91 =  0.91% [all] false positives
   FELINE index has    78 =  0.78% [all] false positives
   FELINE index has 1.16667 less false positives than level filter

   min-post spanning-tree intervals has  3641 = 36.41% [all] false
   negatives

Perhaps something you can do instead of sampling from N^2 commits intotal is to select a pair of generations (say, G = 20000, G' = 20100) orregions of generations ( 20000 <= G <= 20050, 20100 <= G' <= 20150) andsee how many false positives you see by testing all pairs (one from eachlevel). The delta between the generations may need to be smaller toactually have a large proportion of unreachable pairs. Try differentlevels, since major version releases tend to "pinch" the commit graph toa common history.

For git.git repository (https://github.com/git/git.git) that has 52950
nodes and 65887 edges the numbers are slighly more in FELINE index
favor (also out of 10000 random pairs):

   level-filter has   504 =  9.11% false positives
   FELINE index has   125 =  2.26% false positives
   FELINE index has 4.032 less false positives than level filter

This is for FELINE which does not use level / generatio-numbers filter.


Thanks,
-Stolee

Re: [PATCH 0/6] Compute and consume generation numbers

Reply via email to