Re: cherry-pick very slow on big repository

2017-11-21 Thread Elijah Newren
On Tue, Nov 21, 2017 at 4:07 AM, Peter Krefting  wrote:
> Elijah Newren:
>
>> Sure, take a look at the big-repo-small-cherry-pick branch of
>> https://github.com/newren/git
>
>
> With those changes, the time usage is the same as if I set
> merge.renameLimit=1 for the repository, and the end result is identical:
>
> $ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick -x
> 717eb328940ca2e33f14ed27576e656327854b7b
> [redacted 19be3551bc] Redacted
>  Author: Redacted 
>  Date: Mon Oct 16 15:58:05 2017 +0200
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> real0m15,345s
> user0m14,908s
> sys 0m0,528s
>
> Thanks!


Cool, glad it worked for you.  Thanks for testing it out.


Re: cherry-pick very slow on big repository

2017-11-21 Thread Peter Krefting

Elijah Newren:


Sure, take a look at the big-repo-small-cherry-pick branch of
https://github.com/newren/git


With those changes, the time usage is the same as if I set 
merge.renameLimit=1 for the repository, and the end result is identical:


$ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick 
-x 717eb328940ca2e33f14ed27576e656327854b7b

[redacted 19be3551bc] Redacted
 Author: Redacted 
 Date: Mon Oct 16 15:58:05 2017 +0200
 1 file changed, 2 insertions(+), 2 deletions(-)

real0m15,345s
user0m14,908s
sys 0m0,528s

Thanks!

--
\\// Peter - http://www.softwolves.pp.se/


Re: cherry-pick very slow on big repository

2017-11-13 Thread Elijah Newren
On Mon, Nov 13, 2017 at 3:22 AM, Peter Krefting  wrote:
> Elijah Newren:
>
>> I would be very interested to hear how my rename detection performance
>> patches work for you; this kind of usecase was the exact one it was designed
>> to help the most.  See
>> https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/
>
> I'd be happy to try them out. Is there a public repo where I can pull these
> patches from instead of trying to apply them manually, as there are several
> patch series involved here?

Sure, take a look at the big-repo-small-cherry-pick branch of
https://github.com/newren/git


Re: cherry-pick very slow on big repository

2017-11-13 Thread Peter Krefting

Elijah Newren:

I would be very interested to hear how my rename detection 
performance patches work for you; this kind of usecase was the exact 
one it was designed to help the most.  See 
https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/


I'd be happy to try them out. Is there a public repo where I can pull 
these patches from instead of trying to apply them manually, as there 
are several patch series involved here?


--
\\// Peter - http://www.softwolves.pp.se/


RE: cherry-pick very slow on big repository

2017-11-13 Thread Peter Krefting

Kevin Willford:


Since this is happening during a merge, you might need to use merge.renameLimit
or the merge strategy option of -Xno-renames.  Although the code does fallback
to use the diff.renameLimit but there is still a lot that is done before even 
checking
the rename limit so I would first try getting renames turned off.


That makes quite a large difference, with this setting it finishes in 
just a few seconds:


  $ time git -c merge.renameLimit=1 cherry-pick -x 
717eb328940ca2e33f14ed27576e656327854b7b
  [redacted 0576fbaf89] Redacted
   Author: Redacted 
   Date: Mon Oct 16 15:58:05 2017 +0200
   1 file changed, 2 insertions(+), 2 deletions(-)

  real0m15,473s
  user0m14,904s
  sys 0m0,488s

I'll add this setting for the repository for the future, thank you!

--
\\// Peter - http://www.softwolves.pp.se/


Re: cherry-pick very slow on big repository

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting  wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.

I would be very interested to hear how my rename detection performance
patches work for you; this kind of usecase was the exact one it was
designed to help the most.  See
https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/


Re: cherry-pick very slow on big repository

2017-11-10 Thread Elijah Newren
Interesting timing.  I have some performance patches specifically
developed because rename detection during merges made a small
cherry-pick in a large repo rather slow...in my case, I dropped the
time for the cherry pick by a factor of about 30 (no guarantees you'll
see the same; it's very history-specific).  I was just about to start
sending my three series of patches, the performance one being the
third...

On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting  wrote:
> Derrick Stolee:
>
>> Git is spending time detecting renames, which implies you probably renamed
>> a folder or added and deleted a large number of files. This rename detection
>> is quadratic (# adds times # deletes).
>
>
> Yes, a couple of directories with a lot of template files have been renamed
> (and some removed, some added) between the current development branch and
> this old maintenance branch. I get the "Performing inexact rename detection"
> a lot when merging changes in the other direction.
>
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
>
>> You can remove this rename detection by running your cherry-pick with `git
>> -c diff.renameLimit=1 cherry-pick ...`
>
>
> That didn't work, actually it failed to finish with this setting in effect,
> it hangs in such a way that I can't stop it with Ctrl+C (neither when
> running from the command line, nor when running inside gdb). It didn't
> finish in the 20 minutes I gave it.
>
> I also tried with diff.renames=false, which also seemed to fail.
>
>
> --
> \\// Peter - http://www.softwolves.pp.se/


RE: cherry-pick very slow on big repository

2017-11-10 Thread Kevin Willford
Since this is happening during a merge, you might need to use merge.renameLimit
or the merge strategy option of -Xno-renames.  Although the code does fallback
to use the diff.renameLimit but there is still a lot that is done before even 
checking
the rename limit so I would first try getting renames turned off.

Thanks,
Kevin

> -Original Message-
> From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf
> Of Peter Krefting
> Sent: Friday, November 10, 2017 7:05 AM
> To: Derrick Stolee <sto...@gmail.com>
> Cc: Jeff King <p...@peff.net>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: cherry-pick very slow on big repository
> 
> Derrick Stolee:
> 
> > Git is spending time detecting renames, which implies you probably
> > renamed a folder or added and deleted a large number of files. This
> > rename detection is quadratic (# adds times # deletes).
> 
> Yes, a couple of directories with a lot of template files have been
> renamed (and some removed, some added) between the current development
> branch and this old maintenance branch. I get the "Performing inexact
> rename detection" a lot when merging changes in the other direction.
> 
> However, none of them applies to these particular commits, which only
> touches files that are in the exact same location on both branches.
> 
> > You can remove this rename detection by running your cherry-pick
> > with `git -c diff.renameLimit=1 cherry-pick ...`
> 
> That didn't work, actually it failed to finish with this setting in
> effect, it hangs in such a way that I can't stop it with Ctrl+C
> (neither when running from the command line, nor when running inside
> gdb). It didn't finish in the 20 minutes I gave it.
> 
> I also tried with diff.renames=false, which also seemed to fail.
> 
> --
> \\// Peter -
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw
> olves.pp.se%2F=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4
> 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636
> 459195209466999=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG
> VpM%3D=0


Re: cherry-pick very slow on big repository

2017-11-10 Thread Peter Krefting

Derrick Stolee:

Git is spending time detecting renames, which implies you probably 
renamed a folder or added and deleted a large number of files. This 
rename detection is quadratic (# adds times # deletes).


Yes, a couple of directories with a lot of template files have been 
renamed (and some removed, some added) between the current development 
branch and this old maintenance branch. I get the "Performing inexact 
rename detection" a lot when merging changes in the other direction.


However, none of them applies to these particular commits, which only 
touches files that are in the exact same location on both branches.


You can remove this rename detection by running your cherry-pick 
with `git -c diff.renameLimit=1 cherry-pick ...`


That didn't work, actually it failed to finish with this setting in 
effect, it hangs in such a way that I can't stop it with Ctrl+C 
(neither when running from the command line, nor when running inside 
gdb). It didn't finish in the 20 minutes I gave it.


I also tried with diff.renames=false, which also seemed to fail.

--
\\// Peter - http://www.softwolves.pp.se/


Re: cherry-pick very slow on big repository

2017-11-10 Thread Derrick Stolee

On 11/10/2017 7:37 AM, Peter Krefting wrote:

Jeff King:


Can you get a backtrace? I'd do something like:


Seems that it spends most time in diffcore_count_changes(), that is 
where it hits whenever I hit Ctrl+C (various line numbers 199-207 in 
diffcore-delta.c; this is on the v2.15.0 tag).


(gdb) bt
#0  diffcore_count_changes (src=src@entry=0x5db99970,
    dst=dst@entry=0x5d6a4810,
    src_count_p=src_count_p@entry=0x5db8,
    dst_count_p=dst_count_p@entry=0x5d6a4838,
    src_copied=src_copied@entry=0x7fffd3e0,
    literal_added=literal_added@entry=0x7fffd3f0)
    at diffcore-delta.c:203
#1  0x556dee1a in estimate_similarity (minimum_score=3,
    dst=0x5d6a4810, src=0x5db99970) at diffcore-rename.c:193
#2  diffcore_rename (options=options@entry=0x7fffd4f0)
    at diffcore-rename.c:560
#3  0x55623d83 in diffcore_std (
    options=options@entry=0x7fffd4f0) at diff.c:5846
...


Git is spending time detecting renames, which implies you probably 
renamed a folder or added and deleted a large number of files. This 
rename detection is quadratic (# adds times # deletes).


You can remove this rename detection by running your cherry-pick with 
`git -c diff.renameLimit=1 cherry-pick ...`


See https://git-scm.com/docs/diff-config#diff-config-diffrenameLimit

Thanks,
-Stolee


Re: cherry-pick very slow on big repository

2017-11-10 Thread Peter Krefting

Jeff King:


Can you get a backtrace? I'd do something like:


Seems that it spends most time in diffcore_count_changes(), that is 
where it hits whenever I hit Ctrl+C (various line numbers 199-207 in 
diffcore-delta.c; this is on the v2.15.0 tag).


(gdb) bt
#0  diffcore_count_changes (src=src@entry=0x5db99970,
dst=dst@entry=0x5d6a4810,
src_count_p=src_count_p@entry=0x5db8,
dst_count_p=dst_count_p@entry=0x5d6a4838,
src_copied=src_copied@entry=0x7fffd3e0,
literal_added=literal_added@entry=0x7fffd3f0)
at diffcore-delta.c:203
#1  0x556dee1a in estimate_similarity (minimum_score=3,
dst=0x5d6a4810, src=0x5db99970) at diffcore-rename.c:193
#2  diffcore_rename (options=options@entry=0x7fffd4f0)
at diffcore-rename.c:560
#3  0x55623d83 in diffcore_std (
options=options@entry=0x7fffd4f0) at diff.c:5846
#4  0x5564ab46 in get_renames (o=o@entry=0x7fffd850,
tree=tree@entry=0x559d1b98,
o_tree=o_tree@entry=0x559d1bc0,
a_tree=a_tree@entry=0x559d1b98,
b_tree=b_tree@entry=0x559d1b70,
entries=entries@entry=0x59351d20) at merge-recursive.c:554
#5  0x5564e7d9 in merge_trees (o=o@entry=0x7fffd850,
head=head@entry=0x559d1b98, merge=,
merge@entry=0x559d1b70, common=,
common@entry=0x559d1bc0, result=result@entry=0x7fffd830)
at merge-recursive.c:1985
#6  0x5569b2cc in do_recursive_merge (opts=0x7fffdf70,
msgbuf=0x7fffd810, head=0x7fffd7f0,
next_label=, base_label=,
next=, base=0x559c1ba0) at sequencer.c:459
#7  do_pick_commit (command=TODO_PICK,
commit=commit@entry=0x559c1b60,
opts=opts@entry=0x7fffdf70, final_fixup=final_fixup@entry=0)
at sequencer.c:1088
#8  0x5569e324 in single_pick (opts=0x7fffdf70,
cmit=0x559c1b60) at sequencer.c:2306
#9  sequencer_pick_revisions (opts=0x7fffdf70)
at sequencer.c:2355
#10 0x555d4097 in run_sequencer (argc=1, argc@entry=3,
argv=argv@entry=0x7fffe320, opts=,
opts@entry=0x7fffdf70) at builtin/revert.c:200
#11 0x555d449a in cmd_cherry_pick (argc=3,
argv=0x7fffe320, prefix=)
at builtin/revert.c:225
#12 0x55567a38 in run_builtin (argv=,
argc=, p=) at git.c:346
#13 handle_builtin (argc=3, argv=0x7fffe320) at git.c:554
#14 0x55567cf6 in run_argv (argv=0x7fffe0e0,
argcp=0x7fffe0ec) at git.c:606
#15 cmd_main (argc=, argv=)
at git.c:683
#16 0x55566e01 in main (argc=4, argv=0x7fffe318)
at common-main.c:43

--
\\// Peter - http://www.softwolves.pp.se/


Re: cherry-pick very slow on big repository

2017-11-10 Thread Jeff King
On Fri, Nov 10, 2017 at 10:39:39AM +0100, Peter Krefting wrote:

> Running strace, it seems like it is doing lstat(), open(), mmap(), close()
> and munmap() on every single file in the repository, which takes a lot of
> time.
> 
> I thought it was just updating the status, but "git status" returns
> immediately, while cherry-picking takes several minutes for every
> cherry-pick I do.

It kind of sounds like a temporary index is being refreshed that doesn't
have the proper stat information.

Can you get a backtrace? I'd do something like:

  - gdb --args git cherry-pick ...
  - 'r' to run
  - give it a few seconds to hit the CPU heavy part, then ^C
  - 'bt' to generate the backtrace

which should give a sense of which code path is leading to the slowdown
(or of course use real profiling tools, but if the slow path is taking 6
minutes, you'll be likely to stop in the middle of it ;) ).

-Peff


cherry-pick very slow on big repository

2017-11-10 Thread Peter Krefting

Hi!

On a big repository (57000 files, 2,5 gigabytes in .git/objects), git 
cherry-pick is very slow for me (v2.15.0). This is cherry-picking a 
one-file change, where the file is in the same place on both branches, 
and which applies cleanly (I am backporting a few fixes to a 
maintenance version):


$ time git cherry-pick -x 717eb328940ca2e33f14ed27576e656327854b7b
[redacted 391454f16d] Redacted
 Author: Redacted 
 Date: Mon Oct 16 15:58:05 2017 +0200
 1 file changed, 2 insertions(+), 2 deletions(-)

real6m9,054s
user5m49,432s
sys 0m2,292s

Something is not how it should be here. The repo shares objects 
(.git/objects/info/alternates) with another repository (I have run 
"git gc" on both repositories).


Running strace, it seems like it is doing lstat(), open(), mmap(), 
close() and munmap() on every single file in the repository, which 
takes a lot of time.


I thought it was just updating the status, but "git status" returns 
immediately, while cherry-picking takes several minutes for every 
cherry-pick I do.


--
\\// Peter - http://www.softwolves.pp.se/