Re: [PATCH v3] add status config and command line options for rename detection

2018-05-14 Thread Ben Peart



On 5/12/2018 4:04 AM, Eckhard Maaß wrote:

On Fri, May 11, 2018 at 12:56:39PM +, Ben Peart wrote:

After performing a merge that has conflicts git status will, by default,
attempt to detect renames which causes many objects to be examined.  In a
virtualized repo, those objects do not exist locally so the rename logic
triggers them to be fetched from the server. This results in the status call
taking hours to complete on very large repos vs seconds with this patch.


I see where your need comes from, but as you based this on my little
patch one can achieve this already with tweaking diff.renames itself. I
do wonder why there is a special need for the status command here


The rename detection feature is nice and we'd like to leave it on 
whenever possible.  The performance issues only occur when in the middle 
of a merge - normal status commands behave reasonably.  As a result, we 
don't want to just turn it off completely by setting diff.renames.


Until we come up with a more elegant solution, we currently turn it off 
completely for merge via the new merge settings and then intercept calls 
to status and if there is a MERGE_HEAD we turn it off for status just 
for that specific call.  I view this as a temporary solution so would 
not want to put that logic into git proper as it is quite specific to 
when running git on a virtualized repo.



if there is, I personally would like it more in a style that you could
take all the options provided by diff.*-configuration and prefix that
with status, eg status.diff.renames = true. What do you think? If you
really only need this for merges, maybe a more specialised option is
called for that only kicks in when there is a merge going on?

I would like that status behaves as similar as possible to
diff/show/log. Special options will pull away from that again - passing
-m to show or log will lead to the same performance issues, correct?
Could it be feasible to impose an overall time limit on the detection?



I agree that they should behave as similar as possible which is why all 
the new settings default to the diff setting when not explicitly set.  I 
believe this is a good model - if you don't do anything special you get 
the default/same behavior but if you know and need special behavior, you 
now have that option.



And after writing this I wonder what were your experience with just
tweaking renameLimit - setting it very low should have helped the
fetching from server part already, shouldn't it?


Add --no-renames command line option to status that enables overriding the
config setting from the command line. Add --find-renames[=] command line
option to status that enables detecting renames and optionally setting the
similarity index.


Would it be reasonable to extend this so that we just use the same
machinery for parsing command line options for the diffcore options and
pass this along? It seems to me that git status wants the same init as
diff/show/log has anyway. But I like the direction towards passing more
command line options to the git status command.



I agree that it is unfortunate that diff/merge/status all parse and deal 
with config settings differently.  I'd be happy to see someone tackle 
that and move the code to a single, coherent model but that is beyond 
the scope of this patch.



  static void wt_longstatus_print_unmerged_header(struct wt_status *s)
@@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct 
wt_status *s)
}
rev.diffopt.format_callback = wt_status_collect_changed_cb;
rev.diffopt.format_callback_data = s;
+   rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
rev.diffopt.detect_rename;
+   rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
rev.diffopt.rename_limit;
+   rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
rev.diffopt.rename_score;
copy_pathspec(_data, >pathspec);
run_diff_files(, 0);
  }
@@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct 
wt_status *s)
rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = wt_status_collect_updated_cb;
rev.diffopt.format_callback_data = s;
+   rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
rev.diffopt.detect_rename;
+   rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
rev.diffopt.rename_limit;
+   rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
rev.diffopt.rename_score;
copy_pathspec(_data, >pathspec);
run_diff_index(, 1);
  }
@@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
setup_revisions(0, NULL, , );
  
  	rev.diffopt.output_format |= DIFF_FORMAT_PATCH;

+   rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
rev.diffopt.detect_rename;
+   rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 

Re: [PATCH v3] add status config and command line options for rename detection

2018-05-12 Thread Eckhard Maaß
On Fri, May 11, 2018 at 12:56:39PM +, Ben Peart wrote:
> After performing a merge that has conflicts git status will, by default,
> attempt to detect renames which causes many objects to be examined.  In a
> virtualized repo, those objects do not exist locally so the rename logic
> triggers them to be fetched from the server. This results in the status call
> taking hours to complete on very large repos vs seconds with this patch.

I see where your need comes from, but as you based this on my little
patch one can achieve this already with tweaking diff.renames itself. I
do wonder why there is a special need for the status command here. And
if there is, I personally would like it more in a style that you could
take all the options provided by diff.*-configuration and prefix that
with status, eg status.diff.renames = true. What do you think? If you
really only need this for merges, maybe a more specialised option is
called for that only kicks in when there is a merge going on?

I would like that status behaves as similar as possible to
diff/show/log. Special options will pull away from that again - passing
-m to show or log will lead to the same performance issues, correct?
Could it be feasible to impose an overall time limit on the detection?

And after writing this I wonder what were your experience with just
tweaking renameLimit - setting it very low should have helped the
fetching from server part already, shouldn't it?

> Add --no-renames command line option to status that enables overriding the
> config setting from the command line. Add --find-renames[=] command line
> option to status that enables detecting renames and optionally setting the
> similarity index.

Would it be reasonable to extend this so that we just use the same
machinery for parsing command line options for the diffcore options and
pass this along? It seems to me that git status wants the same init as
diff/show/log has anyway. But I like the direction towards passing more
command line options to the git status command. 

>  static void wt_longstatus_print_unmerged_header(struct wt_status *s)
> @@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct 
> wt_status *s)
>   }
>   rev.diffopt.format_callback = wt_status_collect_changed_cb;
>   rev.diffopt.format_callback_data = s;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   copy_pathspec(_data, >pathspec);
>   run_diff_files(, 0);
>  }
> @@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct 
> wt_status *s)
>   rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
>   rev.diffopt.format_callback = wt_status_collect_updated_cb;
>   rev.diffopt.format_callback_data = s;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   copy_pathspec(_data, >pathspec);
>   run_diff_index(, 1);
>  }
> @@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status 
> *s)
>   setup_revisions(0, NULL, , );
>  
>   rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   rev.diffopt.file = s->fp;
>   rev.diffopt.close_file = 0;
>   /*

Somehow I am inclined that those should be factored out to a common
method if the rest of the patch stays as it is.

Greetings,
Eckhard


Re: [PATCH v3] add status config and command line options for rename detection

2018-05-11 Thread Elijah Newren
Hi Ben,

On Fri, May 11, 2018 at 5:56 AM, Ben Peart  wrote:
> After performing a merge that has conflicts git status will, by default,
> attempt to detect renames which causes many objects to be examined.  In a
> virtualized repo, those objects do not exist locally so the rename logic
> triggers them to be fetched from the server. This results in the status call
> taking hours to complete on very large repos vs seconds with this patch.
>
> Add a new config status.renames setting to enable turning off rename detection
> during status and commit.  This setting will default to the value of
> diff.renames.
>
> Add a new config status.renamelimit setting to to enable bounding the time
> spent finding out inexact renames during status and commit.  This setting will
> default to the value of diff.renamelimit.
>
> Add --no-renames command line option to status that enables overriding the
> config setting from the command line. Add --find-renames[=] command line
> option to status that enables detecting renames and optionally setting the
> similarity index.

Any chance I could get you to re-wrap this at a smaller column width?
Doesn't fit in my (80-char) terminal when I run `git log`; a couple
lines run over by a couple characters.  (Sorry for not noticing this
earlier)

> This patch depends on em/status-rename-config

I'd leave this line for the notes.  It's useful information now, but
won't be to someone looking at the commit a year from now, so it
probably doesn't belong in the commit message.


With those two changes:
  Reviewed-by: Elijah Newren 


[PATCH v3] add status config and command line options for rename detection

2018-05-11 Thread Ben Peart
After performing a merge that has conflicts git status will, by default,
attempt to detect renames which causes many objects to be examined.  In a
virtualized repo, those objects do not exist locally so the rename logic
triggers them to be fetched from the server. This results in the status call
taking hours to complete on very large repos vs seconds with this patch.

Add a new config status.renames setting to enable turning off rename detection
during status and commit.  This setting will default to the value of
diff.renames.

Add a new config status.renamelimit setting to to enable bounding the time
spent finding out inexact renames during status and commit.  This setting will
default to the value of diff.renamelimit.

Add --no-renames command line option to status that enables overriding the
config setting from the command line. Add --find-renames[=] command line
option to status that enables detecting renames and optionally setting the
similarity index.

This patch depends on em/status-rename-config

Original-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref: commit dc6b1d92ca9c0c538daa244e3910bb8b2a50d959 
(em/status-rename-config)
Web-Diff: https://github.com/benpeart/git/commit/5bac43610b
Checkout: git fetch https://github.com/benpeart/git status-renames-v3 && 
git checkout 5bac43610b

### Interdiff (v2..v3):

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9c8eca05b1..4f1ead 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3120,14 +3120,16 @@ status.displayCommentPrefix::
Defaults to false.

 status.renameLimit::
-   The number of files to consider when performing rename detection;
-   if not specified, defaults to the value of diff.renameLimit.
+   The number of files to consider when performing rename detection
+   in linkgit:git-status[1] and linkgit:git-commit[1]. Defaults to
+   the value of diff.renameLimit.

 status.renames::
-   Whether and how Git detects renames.  If set to "false",
-   rename detection is disabled. If set to "true", basic rename
-   detection is enabled.  If set to "copies" or "copy", Git will
-   detect copies, as well.  Defaults to the value of diff.renames.
+   Whether and how Git detects renames in linkgit:git-status[1] and
+   linkgit:git-commit[1] .  If set to "false", rename detection is
+   disabled. If set to "true", basic rename detection is enabled.
+   If set to "copies" or "copy", Git will detect copies, as well.
+   Defaults to the value of diff.renames.

 status.showStash::
If set to true, linkgit:git-status[1] will display the number of
diff --git a/builtin/commit.c b/builtin/commit.c
index db886277f4..b50e33ef48 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1373,6 +1373,7 @@ int cmd_status(int argc, const char **argv, const 
char *prefix)
if (no_renames != -1)
s.detect_rename = !no_renames;
if ((intptr_t)rename_score_arg != -1) {
+   if (s.detect_rename < DIFF_DETECT_RENAME)
s.detect_rename = DIFF_DETECT_RENAME;
if (rename_score_arg)
s.rename_score = parse_rename_score(_score_arg);
diff --git a/t/t7525-status-rename.sh b/t/t7525-status-rename.sh
old mode 100644
new mode 100755
index 311df8038a..ef8b1b3078
--- a/t/t7525-status-rename.sh
+++ b/t/t7525-status-rename.sh
@@ -10,14 +10,13 @@ test_expect_success 'setup' '
git commit -m"Adding original file." &&
mv original renamed &&
echo 2 >> renamed &&
-   git add .
-'
-
-cat >.gitignore <<\EOF
+   git add . &&
+   cat >.gitignore <<-\EOF
.gitignore
expect*
actual*
EOF
+'

 test_expect_success 'status no-options' '
git status >actual &&
@@ -63,7 +62,18 @@ test_expect_success 'status status.renames=true' '
test_i18ngrep "renamed:" actual
 '

-test_expect_success 'status config overriden' '
+test_expect_success 'commit honors status.renames=false' '
+   git -c status.renames=false commit --dry-run >actual &&
+   test_i18ngrep "deleted:" actual &&
+   test_i18ngrep "new file:" actual
+'
+
+test_expect_success 'commit honors status.renames=true' '
+   git -c status.renames=true commit --dry-run >actual &&
+   test_i18ngrep "renamed:" actual
+'
+
+test_expect_success 'status config overridden' '
git -c status.renames=true status --no-renames >actual &&
test_i18ngrep "deleted:" actual &&
test_i18ngrep "new file:" actual
@@ -87,4 +97,17 @@ test_expect_success 'status score=01%' '
test_i18ngrep "renamed:" actual
 '

+test_expect_success 'copies not overridden by find-rename' '