Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-11-01 Thread Junio C Hamano
Derrick Stolee  writes:

>> Review discussions seem to have petered out.  Would we merge this to
>> 'next' and start cooking, perhaps for the remainder of this cycle?
>
> Thanks, but I've just sent a v5 responding to Jakub's feedback on v4. [1]
>
> I'd be happy to let it sit in next until you feel it has cooked long
> enough. I'm available to respond to feedback in the form of new
> topics.

OK.  I'm quite happy to see this round of review helped greatly by
Jakub, by the way.

THanks, both.


Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-11-01 Thread Derrick Stolee

On 11/1/2018 1:21 AM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:


This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').

Review discussions seem to have petered out.  Would we merge this to
'next' and start cooking, perhaps for the remainder of this cycle?


Thanks, but I've just sent a v5 responding to Jakub's feedback on v4. [1]

I'd be happy to let it sit in next until you feel it has cooked long 
enough. I'm available to respond to feedback in the form of new topics.


Thanks,
-Stolee

[1] 
https://public-inbox.org/git/20181101134623.84055-1-dsto...@microsoft.com/T/#u


Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-31 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> This patch series performs a decently-sized refactoring of the revision-walk
> machinery. Well, "refactoring" is probably the wrong word, as I don't
> actually remove the old code. Instead, when we see certain options in the
> 'rev_info' struct, we redirect the commit-walk logic to a new set of methods
> that distribute the workload differently. By using generation numbers in the
> commit-graph, we can significantly improve 'git log --graph' commands (and
> the underlying 'git rev-list --topo-order').

Review discussions seem to have petered out.  Would we merge this to
'next' and start cooking, perhaps for the remainder of this cycle?


Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-21 Thread Jakub Narebski
"Derrick Stolee via GitGitGadget"  writes:

> This patch series performs a decently-sized refactoring of the revision-walk
> machinery. Well, "refactoring" is probably the wrong word, as I don't
> actually remove the old code. Instead, when we see certain options in the
> 'rev_info' struct, we redirect the commit-walk logic to a new set of methods
> that distribute the workload differently. By using generation numbers in the
> commit-graph, we can significantly improve 'git log --graph' commands (and
> the underlying 'git rev-list --topo-order').
>
> On the Linux repository, I got the following performance results when
> comparing to the previous version with or without a commit-graph:
>
> Test: git rev-list --topo-order -100 HEAD
> HEAD~1, no commit-graph: 6.80 s
> HEAD~1, w/ commit-graph: 0.77 s
>   HEAD, w/ commit-graph: 0.02 s
>
> Test: git rev-list --topo-order -100 HEAD -- tools
> HEAD~1, no commit-graph: 9.63 s
> HEAD~1, w/ commit-graph: 6.06 s
>   HEAD, w/ commit-graph: 0.06 s

I wonder if we could make use of existing infrstructure in 't/perf/' to
perform those benchmarks for us (perhaps augmented with large
repository, and only if requested -- similarly to how long tests are
handled).

--
Jakub Narębski


[PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-16 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

Test: git rev-list --topo-order -100 HEAD
HEAD~1, no commit-graph: 6.80 s
HEAD~1, w/ commit-graph: 0.77 s
  HEAD, w/ commit-graph: 0.02 s

Test: git rev-list --topo-order -100 HEAD -- tools
HEAD~1, no commit-graph: 9.63 s
HEAD~1, w/ commit-graph: 6.06 s
  HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph and
generation numbers, then I recommend reading 
Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

Since revision.c is an incredibly important (and old) portion of the
codebase -- and because there are so many orthogonal options in 'struct
rev_info' -- I consider this submission to be "RFC quality". That is, I am
not confident that I am not missing anything, or that my solution is the
best it can be. I did merge this branch with ds/commit-graph-with-grafts and
the "DO-NOT-MERGE: write and read commit-graph always" commit that computes
a commit-graph with every 'git commit' command. The test suite passed with
that change, available on GitHub [3]. To ensure that I cover at least the
case I think are interesting, I added tests to t6600-test-reach.sh to verify
the walks report the correct results for the three cases there (no
commit-graph, full commit-graph, and a partial commit-graph so the walk
starts at GENERATION_NUMBER_INFINITY).

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending commits
are UNINTERESTING. Since this code depends on commit_list instead of the
prio_queue we are using here, I chose to leave it untouched for now. We can
revisit it in a separate series later. Since handle_commit() turns on
revs->limited when a commit is UNINTERESTING, we do not hit the new code in
this case. Removing this 'revs->limited = 1;' line yields correct results,
but the performance is worse.

This series was based on ds/reachable, but is now based on 'master' to not
conflict with 182070 "commit: use timestamp_t for author_date_slab". There
is a small conflict with md/filter-trees, because it renamed a flag in
revisions.h in the line before I add new flags. Hopefully this conflict is
not too difficult to resolve.

Changes in V3: I added a new patch that updates the tab-alignment for flags
in revision.h before adding new ones (Thanks, Ævar!). Also, I squashed the
recommended changes to run_three_modes and test_three_modes from Szeder and
Junio. Thanks!

Changes in V4: I'm sending a V4 to respond to the feedback so far. Still
looking forward to more on the really big commit!

 * Removed the whitespace changes to the flags in revision.c that caused
   merge pain. 
   
   
 * The prio-queue peek function is now covered by tests when in "stack"
   mode.
   
   
 * The "add_parents_to_list()" function is now renamed to
   "process_parents()"
   
   
 * Added a new commit that expands test coverage with alternate orders and
   file history (use GIT_TEST_COMMIT_GRAPH to have
   t6012-rev-list-simplify.sh cover the new logic). These tests found a
   problem with author dates (I forgot to record them during the explore
   walk).
   
   
 * Commit message edits.
   
   

Thanks, -Stolee

[1] 
https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] 
https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/testA branch
containing this series along with commits to compute commit-graph in entire
test suite.

Cc: avarab@gmail.comCc: szeder@gmail.com

Derrick Stolee (7):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: generation-based topo-order algorithm
  t6012: make rev-list tests more interesting

 commit.c |  11 +-
 commit.h |   8 ++
 object.h