subject:"Re\: Question about the ahead\-behind computation and status"

Re: Question about the ahead-behind computation and status

2017-12-15 Thread Derrick Stolee


On 12/15/2017 1:30 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


The biggest reason for the 20 seconds is not just the number of
commits in the ahead/behind but how many commits are walked (including
common to both branches) before paint_down_to_common() breaks its
while loop due to queue_has_nonstale().

Hmm, queue_has_nonstale() looks to see if any element is not STALE
(where the definition of STALE is "known to be a common ancestor")
by potentially checking all elements in the queue.  I wonder if we
have an opportunity for a trivial optimization?  When the caller
knows that it dug one level and added the parents that are not
stale, it does not have to ask queue_has_nonstale() if there is any
non stale element, for example.


I thought this, too, but my tracing did not show significant time spent 
in this method. 99% of the time is spent unzipping and parsing commits.


If this was taking too long, then we could track a minimum timestamp for 
a commit that entered the queue in a non-stale state, but this will 
delay the termination condition a bit since commits can be marked stale 
after they enter the queue.



What do you exactly mean by "not just the number of commits in the
ahead/behind"?  Aren't the number of these commits pretty much
proportional to the number of commits we need to paint down to
common ancestors?  Is the latter a lot larger than the former
(i.e. are we somehow not stopping when we _could_ notice that we
can with better information)?




With the wide history, there is actually a large set of commits that are 
in the common history but have newer commit times than the oldest commit 
in only one history. Consider the following ASCII art:


  A
  |
  1
  |
  2
  |
  3
  |\
  4 B
  |\|
  5 C
  |\|
  6 D
  |\|
   .
   .
   .
  |\|
  N Z
  |/
  0

Between A and B, A is ahead by commits {A,1,2,3,4,5,6,...,N}. Meanwhile, 
commits B,C,D,...,Z are in the common history, but still must be walked.


Now imagine these two sets are actually much MUCH wider (thousands of 
commits that are pairwise non-ancestors). This causes the number of 
walked commits to be larger than just the number of commits in the 
symmetric difference of the histories.


Unfortunately, generation numbers are not the only optimization needed 
to make this call be sub-100ms. A graph data structure that lists the 
edges between commits would prevent the time spent in zlib decompressing 
and parsing commits. I'm working on investigating how such a data 
structure (and corresponding file format) could integrate in the 
commit-walking code.


Thanks,
-Stolee

Re: Question about the ahead-behind computation and status

2017-12-15 Thread Junio C Hamano

Derrick Stolee  writes:

> The biggest reason for the 20 seconds is not just the number of
> commits in the ahead/behind but how many commits are walked (including
> common to both branches) before paint_down_to_common() breaks its
> while loop due to queue_has_nonstale().

Hmm, queue_has_nonstale() looks to see if any element is not STALE
(where the definition of STALE is "known to be a common ancestor")
by potentially checking all elements in the queue.  I wonder if we
have an opportunity for a trivial optimization?  When the caller
knows that it dug one level and added the parents that are not
stale, it does not have to ask queue_has_nonstale() if there is any
non stale element, for example.

What do you exactly mean by "not just the number of commits in the
ahead/behind"?  Aren't the number of these commits pretty much
proportional to the number of commits we need to paint down to
common ancestors?  Is the latter a lot larger than the former
(i.e. are we somehow not stopping when we _could_ notice that we
can with better information)?

Re: Question about the ahead-behind computation and status

2017-12-15 Thread Derrick Stolee


On 12/15/2017 10:08 AM, Jeff Hostetler wrote:

On 12/15/2017 5:08 AM, Jeff King wrote:

On Thu, Dec 14, 2017 at 04:49:31PM -0500, Jeff Hostetler wrote:

[*] Sadly, the local repo was only about 20 days out of
 date (including the Thanksgiving holidays)


Taking 20 seconds to traverse 20 days worth of history sounds pretty
awful. How many commits is it?


150,000 commits, give or take.  The graph is many thousands of lanes
wide because of the merges and that isn't helping.

(But I should give you folks lots of credit -- it *only* took 20
seconds to find, unzip and decode 150,000 commit objects and walk
the commit graph.)


To give a few more data points, I created similar situation by checking 
out a big repo I hadn't updated in three months and it was 16,000 
commits behind. That took 1.5s to calculate the ahead/behind. Moving it 
100,000 commits behind it took 5s. This repo has about 300,000 total 
commits versus the 1.5 million commits in the Windows repo.


The biggest reason for the 20 seconds is not just the number of commits 
in the ahead/behind but how many commits are walked (including common to 
both branches) before paint_down_to_common() breaks its while loop due 
to queue_has_nonstale().


Thanks,
-Stolee

Re: Question about the ahead-behind computation and status

2017-12-15 Thread Jeff Hostetler




On 12/15/2017 5:08 AM, Jeff King wrote:

On Thu, Dec 14, 2017 at 04:49:31PM -0500, Jeff Hostetler wrote:


I don't want to jump into the graph algorithm at this time,
but was wondering about adding a --no-ahead-behind flag (or
something similar or a config setting) that would disable
the a/b computation during status.

For status V2 output, we could omit the "# branch.ab x y"
line.  For normal status output, change the prose a/b
message to say something like "are [not] up to date".


Is it actually "status --porcelain=v2" that you care about here? Or is
it normal "git status"?


My primary concern is for porcelain V2 because Visual Studio
uses --branch with V2 to get the other branch details with
one process rather than two on inotify events.

Fixing/Helping normal "git status" would be nice for interactive
users.

 

Do you care about seeing the branch at all? I.e., would "--no-branch" be
a viable option (though I notice it seems to be a silent noop with the
long-form, which should probably be fixed).


VS does not use the a/b counts at all, so it is just overhead.
VS does use some of the other branch fields, so it would be nice
to still have them.

It would be nice to minimize changes to the user experience.
Not printing anything about the a/b in the normal/short/long forms
might give the user a false sense of security.  I was thinking if
we changed the message to be "you are [not] up to date", they would
still know they needed to push.

As it is, at this scale, having the message give an exact count
just isn't that useful -- who cares if you are 150,000 or 149,000
behind.

Maybe we could do something like GCC and just give up at 100
and print "you are 100+ ...".

 

If you really do want to see all branch details but just skip the
ahead/behind, then yeah, I'd agree that adding an option to do so makes
sense. Is "status" the only command that needs it? I think we do it
unconditionally with "git checkout", too.


ok. thanks. i'll look at doing that.

Yeah, "checkout" and "branch -vv" could use it too.  I haven't looked
yet to see anywhere else we might want it, but these can wait.



[*] Sadly, the local repo was only about 20 days out of
 date (including the Thanksgiving holidays)


Taking 20 seconds to traverse 20 days worth of history sounds pretty
awful. How many commits is it?


150,000 commits, give or take.  The graph is many thousands of lanes
wide because of the merges and that isn't helping.

(But I should give you folks lots of credit -- it *only* took 20
seconds to find, unzip and decode 150,000 commit objects and walk
the commit graph.)

For a true solution, I think we want to wait for client-side
generation numbers before we spend any time looking at the
algorithm.

Thanks,
Jeff

Re: Question about the ahead-behind computation and status

2017-12-15 Thread Jeff King

On Thu, Dec 14, 2017 at 04:49:31PM -0500, Jeff Hostetler wrote:

> I don't want to jump into the graph algorithm at this time,
> but was wondering about adding a --no-ahead-behind flag (or
> something similar or a config setting) that would disable
> the a/b computation during status.
> 
> For status V2 output, we could omit the "# branch.ab x y"
> line.  For normal status output, change the prose a/b
> message to say something like "are [not] up to date".

Is it actually "status --porcelain=v2" that you care about here? Or is
it normal "git status"?

Do you care about seeing the branch at all? I.e., would "--no-branch" be
a viable option (though I notice it seems to be a silent noop with the
long-form, which should probably be fixed).

If you really do want to see all branch details but just skip the
ahead/behind, then yeah, I'd agree that adding an option to do so makes
sense. Is "status" the only command that needs it? I think we do it
unconditionally with "git checkout", too.

> [*] Sadly, the local repo was only about 20 days out of
> date (including the Thanksgiving holidays)

Taking 20 seconds to traverse 20 days worth of history sounds pretty
awful. How many commits is it?

-Peff

Re: Question about the ahead-behind computation and status

Re: Question about the ahead-behind computation and status

Re: Question about the ahead-behind computation and status

Re: Question about the ahead-behind computation and status

Re: Question about the ahead-behind computation and status

5 matches

Site Navigation

Mail list logo

Footer information