Re: [PATCH v3] add status config and command line options for rename detection
On 5/12/2018 4:04 AM, Eckhard Maaß wrote: On Fri, May 11, 2018 at 12:56:39PM +, Ben Peart wrote: After performing a merge that has conflicts git status will, by default, attempt to detect renames which causes many objects to be examined. In a virtualized repo, those objects do not exist locally so the rename logic triggers them to be fetched from the server. This results in the status call taking hours to complete on very large repos vs seconds with this patch. I see where your need comes from, but as you based this on my little patch one can achieve this already with tweaking diff.renames itself. I do wonder why there is a special need for the status command here The rename detection feature is nice and we'd like to leave it on whenever possible. The performance issues only occur when in the middle of a merge - normal status commands behave reasonably. As a result, we don't want to just turn it off completely by setting diff.renames. Until we come up with a more elegant solution, we currently turn it off completely for merge via the new merge settings and then intercept calls to status and if there is a MERGE_HEAD we turn it off for status just for that specific call. I view this as a temporary solution so would not want to put that logic into git proper as it is quite specific to when running git on a virtualized repo. if there is, I personally would like it more in a style that you could take all the options provided by diff.*-configuration and prefix that with status, eg status.diff.renames = true. What do you think? If you really only need this for merges, maybe a more specialised option is called for that only kicks in when there is a merge going on? I would like that status behaves as similar as possible to diff/show/log. Special options will pull away from that again - passing -m to show or log will lead to the same performance issues, correct? Could it be feasible to impose an overall time limit on the detection? I agree that they should behave as similar as possible which is why all the new settings default to the diff setting when not explicitly set. I believe this is a good model - if you don't do anything special you get the default/same behavior but if you know and need special behavior, you now have that option. And after writing this I wonder what were your experience with just tweaking renameLimit - setting it very low should have helped the fetching from server part already, shouldn't it? Add --no-renames command line option to status that enables overriding the config setting from the command line. Add --find-renames[=] command line option to status that enables detecting renames and optionally setting the similarity index. Would it be reasonable to extend this so that we just use the same machinery for parsing command line options for the diffcore options and pass this along? It seems to me that git status wants the same init as diff/show/log has anyway. But I like the direction towards passing more command line options to the git status command. I agree that it is unfortunate that diff/merge/status all parse and deal with config settings differently. I'd be happy to see someone tackle that and move the code to a single, coherent model but that is beyond the scope of this patch. static void wt_longstatus_print_unmerged_header(struct wt_status *s) @@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct wt_status *s) } rev.diffopt.format_callback = wt_status_collect_changed_cb; rev.diffopt.format_callback_data = s; + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename; + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit; + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score; copy_pathspec(_data, >pathspec); run_diff_files(, 0); } @@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct wt_status *s) rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK; rev.diffopt.format_callback = wt_status_collect_updated_cb; rev.diffopt.format_callback_data = s; + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename; + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit; + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score; copy_pathspec(_data, >pathspec); run_diff_index(, 1); } @@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status *s) setup_revisions(0, NULL, , ); rev.diffopt.output_format |= DIFF_FORMAT_PATCH; + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename; + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit :
Re: [PATCH v3] add status config and command line options for rename detection
On Fri, May 11, 2018 at 12:56:39PM +, Ben Peart wrote: > After performing a merge that has conflicts git status will, by default, > attempt to detect renames which causes many objects to be examined. In a > virtualized repo, those objects do not exist locally so the rename logic > triggers them to be fetched from the server. This results in the status call > taking hours to complete on very large repos vs seconds with this patch. I see where your need comes from, but as you based this on my little patch one can achieve this already with tweaking diff.renames itself. I do wonder why there is a special need for the status command here. And if there is, I personally would like it more in a style that you could take all the options provided by diff.*-configuration and prefix that with status, eg status.diff.renames = true. What do you think? If you really only need this for merges, maybe a more specialised option is called for that only kicks in when there is a merge going on? I would like that status behaves as similar as possible to diff/show/log. Special options will pull away from that again - passing -m to show or log will lead to the same performance issues, correct? Could it be feasible to impose an overall time limit on the detection? And after writing this I wonder what were your experience with just tweaking renameLimit - setting it very low should have helped the fetching from server part already, shouldn't it? > Add --no-renames command line option to status that enables overriding the > config setting from the command line. Add --find-renames[=] command line > option to status that enables detecting renames and optionally setting the > similarity index. Would it be reasonable to extend this so that we just use the same machinery for parsing command line options for the diffcore options and pass this along? It seems to me that git status wants the same init as diff/show/log has anyway. But I like the direction towards passing more command line options to the git status command. > static void wt_longstatus_print_unmerged_header(struct wt_status *s) > @@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct > wt_status *s) > } > rev.diffopt.format_callback = wt_status_collect_changed_cb; > rev.diffopt.format_callback_data = s; > + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : > rev.diffopt.detect_rename; > + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : > rev.diffopt.rename_limit; > + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : > rev.diffopt.rename_score; > copy_pathspec(_data, >pathspec); > run_diff_files(, 0); > } > @@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct > wt_status *s) > rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK; > rev.diffopt.format_callback = wt_status_collect_updated_cb; > rev.diffopt.format_callback_data = s; > + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : > rev.diffopt.detect_rename; > + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : > rev.diffopt.rename_limit; > + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : > rev.diffopt.rename_score; > copy_pathspec(_data, >pathspec); > run_diff_index(, 1); > } > @@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status > *s) > setup_revisions(0, NULL, , ); > > rev.diffopt.output_format |= DIFF_FORMAT_PATCH; > + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : > rev.diffopt.detect_rename; > + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : > rev.diffopt.rename_limit; > + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : > rev.diffopt.rename_score; > rev.diffopt.file = s->fp; > rev.diffopt.close_file = 0; > /* Somehow I am inclined that those should be factored out to a common method if the rest of the patch stays as it is. Greetings, Eckhard
Re: [PATCH v3] add status config and command line options for rename detection
Hi Ben, On Fri, May 11, 2018 at 5:56 AM, Ben Peartwrote: > After performing a merge that has conflicts git status will, by default, > attempt to detect renames which causes many objects to be examined. In a > virtualized repo, those objects do not exist locally so the rename logic > triggers them to be fetched from the server. This results in the status call > taking hours to complete on very large repos vs seconds with this patch. > > Add a new config status.renames setting to enable turning off rename detection > during status and commit. This setting will default to the value of > diff.renames. > > Add a new config status.renamelimit setting to to enable bounding the time > spent finding out inexact renames during status and commit. This setting will > default to the value of diff.renamelimit. > > Add --no-renames command line option to status that enables overriding the > config setting from the command line. Add --find-renames[=] command line > option to status that enables detecting renames and optionally setting the > similarity index. Any chance I could get you to re-wrap this at a smaller column width? Doesn't fit in my (80-char) terminal when I run `git log`; a couple lines run over by a couple characters. (Sorry for not noticing this earlier) > This patch depends on em/status-rename-config I'd leave this line for the notes. It's useful information now, but won't be to someone looking at the commit a year from now, so it probably doesn't belong in the commit message. With those two changes: Reviewed-by: Elijah Newren
[PATCH v3] add status config and command line options for rename detection
After performing a merge that has conflicts git status will, by default, attempt to detect renames which causes many objects to be examined. In a virtualized repo, those objects do not exist locally so the rename logic triggers them to be fetched from the server. This results in the status call taking hours to complete on very large repos vs seconds with this patch. Add a new config status.renames setting to enable turning off rename detection during status and commit. This setting will default to the value of diff.renames. Add a new config status.renamelimit setting to to enable bounding the time spent finding out inexact renames during status and commit. This setting will default to the value of diff.renamelimit. Add --no-renames command line option to status that enables overriding the config setting from the command line. Add --find-renames[=] command line option to status that enables detecting renames and optionally setting the similarity index. This patch depends on em/status-rename-config Original-Patch-by: Alejandro PaulySigned-off-by: Ben Peart --- Notes: Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 (em/status-rename-config) Web-Diff: https://github.com/benpeart/git/commit/5bac43610b Checkout: git fetch https://github.com/benpeart/git status-renames-v3 && git checkout 5bac43610b ### Interdiff (v2..v3): diff --git a/Documentation/config.txt b/Documentation/config.txt index 9c8eca05b1..4f1ead 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3120,14 +3120,16 @@ status.displayCommentPrefix:: Defaults to false. status.renameLimit:: - The number of files to consider when performing rename detection; - if not specified, defaults to the value of diff.renameLimit. + The number of files to consider when performing rename detection + in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to + the value of diff.renameLimit. status.renames:: - Whether and how Git detects renames. If set to "false", - rename detection is disabled. If set to "true", basic rename - detection is enabled. If set to "copies" or "copy", Git will - detect copies, as well. Defaults to the value of diff.renames. + Whether and how Git detects renames in linkgit:git-status[1] and + linkgit:git-commit[1] . If set to "false", rename detection is + disabled. If set to "true", basic rename detection is enabled. + If set to "copies" or "copy", Git will detect copies, as well. + Defaults to the value of diff.renames. status.showStash:: If set to true, linkgit:git-status[1] will display the number of diff --git a/builtin/commit.c b/builtin/commit.c index db886277f4..b50e33ef48 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1373,6 +1373,7 @@ int cmd_status(int argc, const char **argv, const char *prefix) if (no_renames != -1) s.detect_rename = !no_renames; if ((intptr_t)rename_score_arg != -1) { + if (s.detect_rename < DIFF_DETECT_RENAME) s.detect_rename = DIFF_DETECT_RENAME; if (rename_score_arg) s.rename_score = parse_rename_score(_score_arg); diff --git a/t/t7525-status-rename.sh b/t/t7525-status-rename.sh old mode 100644 new mode 100755 index 311df8038a..ef8b1b3078 --- a/t/t7525-status-rename.sh +++ b/t/t7525-status-rename.sh @@ -10,14 +10,13 @@ test_expect_success 'setup' ' git commit -m"Adding original file." && mv original renamed && echo 2 >> renamed && - git add . -' - -cat >.gitignore <<\EOF + git add . && + cat >.gitignore <<-\EOF .gitignore expect* actual* EOF +' test_expect_success 'status no-options' ' git status >actual && @@ -63,7 +62,18 @@ test_expect_success 'status status.renames=true' ' test_i18ngrep "renamed:" actual ' -test_expect_success 'status config overriden' ' +test_expect_success 'commit honors status.renames=false' ' + git -c status.renames=false commit --dry-run >actual && + test_i18ngrep "deleted:" actual && + test_i18ngrep "new file:" actual +' + +test_expect_success 'commit honors status.renames=true' ' + git -c status.renames=true commit --dry-run >actual && + test_i18ngrep "renamed:" actual +' + +test_expect_success 'status config overridden' ' git -c status.renames=true status --no-renames >actual && test_i18ngrep "deleted:" actual && test_i18ngrep "new file:" actual @@ -87,4 +97,17 @@ test_expect_success 'status score=01%' ' test_i18ngrep "renamed:" actual ' +test_expect_success 'copies not overridden by find-rename' '