Re: Question about the ahead-behind computation and status

Derrick Stolee Fri, 15 Dec 2017 11:41:03 -0800

On 12/15/2017 1:30 PM, Junio C Hamano wrote:

Derrick Stolee <sto...@gmail.com> writes:

The biggest reason for the 20 seconds is not just the number of
commits in the ahead/behind but how many commits are walked (including
common to both branches) before paint_down_to_common() breaks its
while loop due to queue_has_nonstale().

Hmm, queue_has_nonstale() looks to see if any element is not STALE
(where the definition of STALE is "known to be a common ancestor")
by potentially checking all elements in the queue.  I wonder if we
have an opportunity for a trivial optimization?  When the caller
knows that it dug one level and added the parents that are not
stale, it does not have to ask queue_has_nonstale() if there is any
non stale element, for example.

I thought this, too, but my tracing did not show significant time spentin this method. 99% of the time is spent unzipping and parsing commits.

If this was taking too long, then we could track a minimum timestamp fora commit that entered the queue in a non-stale state, but this willdelay the termination condition a bit since commits can be marked staleafter they enter the queue.

What do you exactly mean by "not just the number of commits in the
ahead/behind"?  Aren't the number of these commits pretty much
proportional to the number of commits we need to paint down to
common ancestors?  Is the latter a lot larger than the former
(i.e. are we somehow not stopping when we _could_ notice that we
can with better information)?

With the wide history, there is actually a large set of commits that arein the common history but have newer commit times than the oldest commitin only one history. Consider the following ASCII art:


  A
  |
  1
  |
  2
  |
  3
  |\
  4 B
  |\|
  5 C
  |\|
  6 D
  |\|
   .
   .
   .
  |\|
  N Z
  |/
  0

Between A and B, A is ahead by commits {A,1,2,3,4,5,6,...,N}. Meanwhile,commits B,C,D,...,Z are in the common history, but still must be walked.

Now imagine these two sets are actually much MUCH wider (thousands ofcommits that are pairwise non-ancestors). This causes the number ofwalked commits to be larger than just the number of commits in thesymmetric difference of the histories.

Unfortunately, generation numbers are not the only optimization neededto make this call be sub-100ms. A graph data structure that lists theedges between commits would prevent the time spent in zlib decompressingand parsing commits. I'm working on investigating how such a datastructure (and corresponding file format) could integrate in thecommit-walking code.


Thanks,
-Stolee

Re: Question about the ahead-behind computation and status

Reply via email to