Re: cherry-pick very slow on big repository
On Tue, Nov 21, 2017 at 4:07 AM, Peter Krefting wrote: > Elijah Newren: > >> Sure, take a look at the big-repo-small-cherry-pick branch of >> https://github.com/newren/git > > > With those changes, the time usage is the same as if I set > merge.renameLimit=1 for the repository, and the end result is identical: > > $ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick -x > 717eb328940ca2e33f14ed27576e656327854b7b > [redacted 19be3551bc] Redacted > Author: Redacted > Date: Mon Oct 16 15:58:05 2017 +0200 > 1 file changed, 2 insertions(+), 2 deletions(-) > > real0m15,345s > user0m14,908s > sys 0m0,528s > > Thanks! Cool, glad it worked for you. Thanks for testing it out.
Re: cherry-pick very slow on big repository
Elijah Newren: Sure, take a look at the big-repo-small-cherry-pick branch of https://github.com/newren/git With those changes, the time usage is the same as if I set merge.renameLimit=1 for the repository, and the end result is identical: $ time /usr/local/stow/git-v2.15.0-323-g31fe956618/bin/git cherry-pick -x 717eb328940ca2e33f14ed27576e656327854b7b [redacted 19be3551bc] Redacted Author: Redacted Date: Mon Oct 16 15:58:05 2017 +0200 1 file changed, 2 insertions(+), 2 deletions(-) real0m15,345s user0m14,908s sys 0m0,528s Thanks! -- \\// Peter - http://www.softwolves.pp.se/
Re: cherry-pick very slow on big repository
On Mon, Nov 13, 2017 at 3:22 AM, Peter Krefting wrote: > Elijah Newren: > >> I would be very interested to hear how my rename detection performance >> patches work for you; this kind of usecase was the exact one it was designed >> to help the most. See >> https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/ > > I'd be happy to try them out. Is there a public repo where I can pull these > patches from instead of trying to apply them manually, as there are several > patch series involved here? Sure, take a look at the big-repo-small-cherry-pick branch of https://github.com/newren/git
Re: cherry-pick very slow on big repository
Elijah Newren: I would be very interested to hear how my rename detection performance patches work for you; this kind of usecase was the exact one it was designed to help the most. See https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/ I'd be happy to try them out. Is there a public repo where I can pull these patches from instead of trying to apply them manually, as there are several patch series involved here? -- \\// Peter - http://www.softwolves.pp.se/
RE: cherry-pick very slow on big repository
Kevin Willford: Since this is happening during a merge, you might need to use merge.renameLimit or the merge strategy option of -Xno-renames. Although the code does fallback to use the diff.renameLimit but there is still a lot that is done before even checking the rename limit so I would first try getting renames turned off. That makes quite a large difference, with this setting it finishes in just a few seconds: $ time git -c merge.renameLimit=1 cherry-pick -x 717eb328940ca2e33f14ed27576e656327854b7b [redacted 0576fbaf89] Redacted Author: Redacted Date: Mon Oct 16 15:58:05 2017 +0200 1 file changed, 2 insertions(+), 2 deletions(-) real0m15,473s user0m14,904s sys 0m0,488s I'll add this setting for the repository for the future, thank you! -- \\// Peter - http://www.softwolves.pp.se/
Re: cherry-pick very slow on big repository
On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting wrote: > Derrick Stolee: > >> Git is spending time detecting renames, which implies you probably renamed >> a folder or added and deleted a large number of files. This rename detection >> is quadratic (# adds times # deletes). > > Yes, a couple of directories with a lot of template files have been renamed > (and some removed, some added) between the current development branch and > this old maintenance branch. I get the "Performing inexact rename detection" > a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. I would be very interested to hear how my rename detection performance patches work for you; this kind of usecase was the exact one it was designed to help the most. See https://public-inbox.org/git/20171110222156.23221-1-new...@gmail.com/
Re: cherry-pick very slow on big repository
Interesting timing. I have some performance patches specifically developed because rename detection during merges made a small cherry-pick in a large repo rather slow...in my case, I dropped the time for the cherry pick by a factor of about 30 (no guarantees you'll see the same; it's very history-specific). I was just about to start sending my three series of patches, the performance one being the third... On Fri, Nov 10, 2017 at 6:05 AM, Peter Krefting wrote: > Derrick Stolee: > >> Git is spending time detecting renames, which implies you probably renamed >> a folder or added and deleted a large number of files. This rename detection >> is quadratic (# adds times # deletes). > > > Yes, a couple of directories with a lot of template files have been renamed > (and some removed, some added) between the current development branch and > this old maintenance branch. I get the "Performing inexact rename detection" > a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. > >> You can remove this rename detection by running your cherry-pick with `git >> -c diff.renameLimit=1 cherry-pick ...` > > > That didn't work, actually it failed to finish with this setting in effect, > it hangs in such a way that I can't stop it with Ctrl+C (neither when > running from the command line, nor when running inside gdb). It didn't > finish in the 20 minutes I gave it. > > I also tried with diff.renames=false, which also seemed to fail. > > > -- > \\// Peter - http://www.softwolves.pp.se/
RE: cherry-pick very slow on big repository
Since this is happening during a merge, you might need to use merge.renameLimit or the merge strategy option of -Xno-renames. Although the code does fallback to use the diff.renameLimit but there is still a lot that is done before even checking the rename limit so I would first try getting renames turned off. Thanks, Kevin > -Original Message- > From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf > Of Peter Krefting > Sent: Friday, November 10, 2017 7:05 AM > To: Derrick Stolee > Cc: Jeff King ; Git Mailing List > Subject: Re: cherry-pick very slow on big repository > > Derrick Stolee: > > > Git is spending time detecting renames, which implies you probably > > renamed a folder or added and deleted a large number of files. This > > rename detection is quadratic (# adds times # deletes). > > Yes, a couple of directories with a lot of template files have been > renamed (and some removed, some added) between the current development > branch and this old maintenance branch. I get the "Performing inexact > rename detection" a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. > > > You can remove this rename detection by running your cherry-pick > > with `git -c diff.renameLimit=1 cherry-pick ...` > > That didn't work, actually it failed to finish with this setting in > effect, it hangs in such a way that I can't stop it with Ctrl+C > (neither when running from the command line, nor when running inside > gdb). It didn't finish in the 20 minutes I gave it. > > I also tried with diff.renames=false, which also seemed to fail. > > -- > \\// Peter - > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw > olves.pp.se%2F&data=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4 > 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636 > 459195209466999&sdata=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG > VpM%3D&reserved=0
Re: cherry-pick very slow on big repository
Derrick Stolee: Git is spending time detecting renames, which implies you probably renamed a folder or added and deleted a large number of files. This rename detection is quadratic (# adds times # deletes). Yes, a couple of directories with a lot of template files have been renamed (and some removed, some added) between the current development branch and this old maintenance branch. I get the "Performing inexact rename detection" a lot when merging changes in the other direction. However, none of them applies to these particular commits, which only touches files that are in the exact same location on both branches. You can remove this rename detection by running your cherry-pick with `git -c diff.renameLimit=1 cherry-pick ...` That didn't work, actually it failed to finish with this setting in effect, it hangs in such a way that I can't stop it with Ctrl+C (neither when running from the command line, nor when running inside gdb). It didn't finish in the 20 minutes I gave it. I also tried with diff.renames=false, which also seemed to fail. -- \\// Peter - http://www.softwolves.pp.se/
Re: cherry-pick very slow on big repository
On 11/10/2017 7:37 AM, Peter Krefting wrote: Jeff King: Can you get a backtrace? I'd do something like: Seems that it spends most time in diffcore_count_changes(), that is where it hits whenever I hit Ctrl+C (various line numbers 199-207 in diffcore-delta.c; this is on the v2.15.0 tag). (gdb) bt #0 diffcore_count_changes (src=src@entry=0x5db99970, dst=dst@entry=0x5d6a4810, src_count_p=src_count_p@entry=0x5db8, dst_count_p=dst_count_p@entry=0x5d6a4838, src_copied=src_copied@entry=0x7fffd3e0, literal_added=literal_added@entry=0x7fffd3f0) at diffcore-delta.c:203 #1 0x556dee1a in estimate_similarity (minimum_score=3, dst=0x5d6a4810, src=0x5db99970) at diffcore-rename.c:193 #2 diffcore_rename (options=options@entry=0x7fffd4f0) at diffcore-rename.c:560 #3 0x55623d83 in diffcore_std ( options=options@entry=0x7fffd4f0) at diff.c:5846 ... Git is spending time detecting renames, which implies you probably renamed a folder or added and deleted a large number of files. This rename detection is quadratic (# adds times # deletes). You can remove this rename detection by running your cherry-pick with `git -c diff.renameLimit=1 cherry-pick ...` See https://git-scm.com/docs/diff-config#diff-config-diffrenameLimit Thanks, -Stolee
Re: cherry-pick very slow on big repository
Jeff King: Can you get a backtrace? I'd do something like: Seems that it spends most time in diffcore_count_changes(), that is where it hits whenever I hit Ctrl+C (various line numbers 199-207 in diffcore-delta.c; this is on the v2.15.0 tag). (gdb) bt #0 diffcore_count_changes (src=src@entry=0x5db99970, dst=dst@entry=0x5d6a4810, src_count_p=src_count_p@entry=0x5db8, dst_count_p=dst_count_p@entry=0x5d6a4838, src_copied=src_copied@entry=0x7fffd3e0, literal_added=literal_added@entry=0x7fffd3f0) at diffcore-delta.c:203 #1 0x556dee1a in estimate_similarity (minimum_score=3, dst=0x5d6a4810, src=0x5db99970) at diffcore-rename.c:193 #2 diffcore_rename (options=options@entry=0x7fffd4f0) at diffcore-rename.c:560 #3 0x55623d83 in diffcore_std ( options=options@entry=0x7fffd4f0) at diff.c:5846 #4 0x5564ab46 in get_renames (o=o@entry=0x7fffd850, tree=tree@entry=0x559d1b98, o_tree=o_tree@entry=0x559d1bc0, a_tree=a_tree@entry=0x559d1b98, b_tree=b_tree@entry=0x559d1b70, entries=entries@entry=0x59351d20) at merge-recursive.c:554 #5 0x5564e7d9 in merge_trees (o=o@entry=0x7fffd850, head=head@entry=0x559d1b98, merge=, merge@entry=0x559d1b70, common=, common@entry=0x559d1bc0, result=result@entry=0x7fffd830) at merge-recursive.c:1985 #6 0x5569b2cc in do_recursive_merge (opts=0x7fffdf70, msgbuf=0x7fffd810, head=0x7fffd7f0, next_label=, base_label=, next=, base=0x559c1ba0) at sequencer.c:459 #7 do_pick_commit (command=TODO_PICK, commit=commit@entry=0x559c1b60, opts=opts@entry=0x7fffdf70, final_fixup=final_fixup@entry=0) at sequencer.c:1088 #8 0x5569e324 in single_pick (opts=0x7fffdf70, cmit=0x559c1b60) at sequencer.c:2306 #9 sequencer_pick_revisions (opts=0x7fffdf70) at sequencer.c:2355 #10 0x555d4097 in run_sequencer (argc=1, argc@entry=3, argv=argv@entry=0x7fffe320, opts=, opts@entry=0x7fffdf70) at builtin/revert.c:200 #11 0x555d449a in cmd_cherry_pick (argc=3, argv=0x7fffe320, prefix=) at builtin/revert.c:225 #12 0x55567a38 in run_builtin (argv=, argc=, p=) at git.c:346 #13 handle_builtin (argc=3, argv=0x7fffe320) at git.c:554 #14 0x55567cf6 in run_argv (argv=0x7fffe0e0, argcp=0x7fffe0ec) at git.c:606 #15 cmd_main (argc=, argv=) at git.c:683 #16 0x55566e01 in main (argc=4, argv=0x7fffe318) at common-main.c:43 -- \\// Peter - http://www.softwolves.pp.se/
Re: cherry-pick very slow on big repository
On Fri, Nov 10, 2017 at 10:39:39AM +0100, Peter Krefting wrote: > Running strace, it seems like it is doing lstat(), open(), mmap(), close() > and munmap() on every single file in the repository, which takes a lot of > time. > > I thought it was just updating the status, but "git status" returns > immediately, while cherry-picking takes several minutes for every > cherry-pick I do. It kind of sounds like a temporary index is being refreshed that doesn't have the proper stat information. Can you get a backtrace? I'd do something like: - gdb --args git cherry-pick ... - 'r' to run - give it a few seconds to hit the CPU heavy part, then ^C - 'bt' to generate the backtrace which should give a sense of which code path is leading to the slowdown (or of course use real profiling tools, but if the slow path is taking 6 minutes, you'll be likely to stop in the middle of it ;) ). -Peff