Re: [PATCH v1] add status config and command line options for rename detection

2018-05-09 Thread Ben Peart



On 5/9/2018 12:56 PM, Elijah Newren wrote:

Hi Ben,

Overall I think this is good, but I have lots of nit-picky things to
bring up.  :-)




Thank you for the review.  I appreciate the extra set of eyes on these 
changes.  Especially when dealing with the merge logic and settings 
which I am unfamiliar with.




I suspect that status.renames should mention "copy", just as
diff.renames does.  (We didn't mention it in merge.renames, because
merge isn't an operation for which copy detection can be useful -- at
least not until the diffstat at the end of the merge is controlled by
merge.renames as well...)



I wasn't supporting copy (as you noticed later in the patch) but will 
update the patch to do so and update the documentation appropriately.



Also, do these two config settings only affect 'git status', or does
it also affect the status shown when composing a commit message
(assuming the user hasn't turned commit.status off)?  Does it affect
`git commit --dry-run` too?



The config settings only affect 'git status'


--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -109,6 +109,10 @@ static int have_option_m;
  static struct strbuf message = STRBUF_INIT;

  static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;
+static int diff_detect_rename = -1;
+static int status_detect_rename = -1;
+static int diff_rename_limit = -1;
+static int status_rename_limit = -1;


Could we replace these four variables with just two: detect_rename and
rename_limit?  Keeping these separate invites people to write code
using only one of the settings rather than the appropriate inherited
mixture of them, resulting in a weird bug.  More on this below...



This model was inherited from the diff code and replicated to the merge 
code.  However, it would be nice to get rid of these 4 static variables. 
 See below for a proposal on how to do that...



@@ -1259,11 +1273,29 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
 return error(_("Invalid untracked files mode '%s'"), 
v);
 return 0;
 }
+   if (!strcmp(k, "diff.renamelimit")) {
+   diff_rename_limit = git_config_int(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "status.renamelimit")) {
+   status_rename_limit = git_config_int(k, v);
+   return 0;
+   }


Here, since you're already checking diff.renamelimit first, you can
just set rename_limit in both blocks and not need both
diff_rename_limit and status_rename_limit.  (Similar can be said for
diff.renames/status.renames.)


It really doesn't work that way - the call back is called for every 
config setting and there is no specific order they are called with. 
Typically, you just test for and save off any that you care about like 
I"m doing here.


I can update the logic here so that as I save off the settings that it 
will also enforce the priority model (ie the diff setting can't override 
the status setting) and then compute the final value once I have the 
command line arguments as they override either config setting (if present).


On the plus side, this change passes the red/green test but it now 
splits the priority logic into two places and doesn't match with how 
diff and merge handle this same issue.







@@ -1297,6 +1329,10 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
   N_("ignore changes to submodules, optional when: all, dirty, 
untracked. (Default: all)"),
   PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
 OPT_COLUMN(0, "column", , N_("list untracked files in 
columns")),
+   OPT_BOOL(0, "no-renames", _renames, N_("do not detect 
renames")),
+   { OPTION_CALLBACK, 'M', "find-renames", _score_arg,
+ N_("n"), N_("detect renames, optionally set similarity 
index"),
+ PARSE_OPT_OPTARG, opt_parse_rename_score },


Should probably also document these options in
Documentation/git-status.txt (and maybe Documentation/git-commit.txt
as well).


Good point, will do.



Not sure if we want to add a flag for copy detection (similar to
git-diff's -C/--find-copies and --find-copies-harder), or just leave
that for when someone finds a need.  If left out, might want to just
mention that it was considered and intentionally omitted for now in
the commit message.



I tend to only implement the features I know are actually needed so 
intentionally omitted this (along with many other potential diff options).



+   if ((intptr_t)rename_score_arg != -1) {


I don't understand why rename_score_arg is a (char*) and then you need
to cast to intptr_t, but that may just be because I haven't done much
of anything with option parsing.  A quick look at the docs isn't
making it clear to me, though; could you enlighten me?



Yes, it is related to making parse_options() do what we need.  -1 means 
the command line option wasn't passed so use the default.  NULL 

Re: [PATCH v1] add status config and command line options for rename detection

2018-05-09 Thread Ben Peart



On 5/9/2018 11:59 AM, Duy Nguyen wrote:

On Wed, May 9, 2018 at 4:42 PM, Ben Peart  wrote:

Add a new config status.renames setting to enable turning off rename detection
during status.  This setting will default to the value of diff.renames.


Please add the reason you need this config key in the commit message.
My guess (probably correct) is on super large repo (how large?),
rename detection is just too slow (how long?) that it practically
makes git-status unusable.



Yes, the reasons for this change are the same as for the patch that 
added these same flags for merge and have to do with the poor 
performance of rename detection with large repos.  I'll update the 
commit message to be more descriptive (see below) and correct some 
spelling errors.



add status config and command line options for rename detection

After performing a merge that has conflicts, git status will by default 
attempt to detect renames which causes many objects to be examined.  In 
a virtualized repo, those objects do not exist locally so the rename 
logic triggers them to be fetched from the server. This results in the 
status call taking hours to complete on very large repos.  Even in a 
small repo (the GVFS repo) turning off break and rename detection has a 
significant impact:


git status --no-renames:
31 secs., 105 loose object downloads

git status --no-breaks
7 secs., 17 loose object downloads

git status --no-breaks --no-renames
1 sec., 1 loose object download

Add a new config status.renames setting to enable turning off rename 
detection during status.  This setting will default to the value of 
diff.renames.


Add a new config status.renamelimit setting to to enable bounding the 
time spent finding out inexact renames during status.  This setting will 
default to the value of diff.renamelimit.


Add status --no-renames command line option that enables overriding the 
config setting from the command line. Add --find-renames[=] to enable 
detecting renames and optionally setting the similarity index from the 
command line.


Note: I removed the --no-breaks command line option from the original 
patch as it will no longer be needed once the default has been changed 
[1] to turn it off.


[1] 
https://public-inbox.org/git/20180430093421.27551-2-eckhard.s.ma...@gmail.com/


Original-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 



This information could be helpful when we optimize rename detection to
be more efficient.



Add a new config status.renamelimit setting to to enable bounding the time spent
finding out inexact renames during status.  This setting will default to the
value of diff.renamelimit.

Add status --no-renames command line option that enables overriding the config
setting from the command line. Add --find-renames[=] to enable detecting
renames and optionaly setting the similarity index from the command line.

Origional-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 


Re: [PATCH v1] add status config and command line options for rename detection

2018-05-09 Thread Elijah Newren
Hi Ben,

Overall I think this is good, but I have lots of nit-picky things to
bring up.  :-)


On Wed, May 9, 2018 at 7:42 AM, Ben Peart  wrote:
> Add status --no-renames command line option that enables overriding the config
> setting from the command line. Add --find-renames[=] to enable detecting
> renames and optionaly setting the similarity index from the command line.

s/optionaly/optionally/

> Notes:
> Base Ref:

No base ref?  ;-)

> +status.renameLimit::
> +   The number of files to consider when performing rename detection;
> +   if not specified, defaults to the value of diff.renameLimit.
> +
> +status.renames::
> +   Whether and how Git detects renames.  If set to "false",
> +   rename detection is disabled. If set to "true", basic rename
> +   detection is enabled.  Defaults to the value of diff.renames.

I suspect that status.renames should mention "copy", just as
diff.renames does.  (We didn't mention it in merge.renames, because
merge isn't an operation for which copy detection can be useful -- at
least not until the diffstat at the end of the merge is controlled by
merge.renames as well...)

Also, do these two config settings only affect 'git status', or does
it also affect the status shown when composing a commit message
(assuming the user hasn't turned commit.status off)?  Does it affect
`git commit --dry-run` too?

> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -109,6 +109,10 @@ static int have_option_m;
>  static struct strbuf message = STRBUF_INIT;
>
>  static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;
> +static int diff_detect_rename = -1;
> +static int status_detect_rename = -1;
> +static int diff_rename_limit = -1;
> +static int status_rename_limit = -1;

Could we replace these four variables with just two: detect_rename and
rename_limit?  Keeping these separate invites people to write code
using only one of the settings rather than the appropriate inherited
mixture of them, resulting in a weird bug.  More on this below...

> @@ -1259,11 +1273,29 @@ static int git_status_config(const char *k, const 
> char *v, void *cb)
> return error(_("Invalid untracked files mode '%s'"), 
> v);
> return 0;
> }
> +   if (!strcmp(k, "diff.renamelimit")) {
> +   diff_rename_limit = git_config_int(k, v);
> +   return 0;
> +   }
> +   if (!strcmp(k, "status.renamelimit")) {
> +   status_rename_limit = git_config_int(k, v);
> +   return 0;
> +   }

Here, since you're already checking diff.renamelimit first, you can
just set rename_limit in both blocks and not need both
diff_rename_limit and status_rename_limit.  (Similar can be said for
diff.renames/status.renames.)



> @@ -1297,6 +1329,10 @@ int cmd_status(int argc, const char **argv, const char 
> *prefix)
>   N_("ignore changes to submodules, optional when: all, 
> dirty, untracked. (Default: all)"),
>   PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
> OPT_COLUMN(0, "column", , N_("list untracked files 
> in columns")),
> +   OPT_BOOL(0, "no-renames", _renames, N_("do not detect 
> renames")),
> +   { OPTION_CALLBACK, 'M', "find-renames", _score_arg,
> + N_("n"), N_("detect renames, optionally set similarity 
> index"),
> + PARSE_OPT_OPTARG, opt_parse_rename_score },

Should probably also document these options in
Documentation/git-status.txt (and maybe Documentation/git-commit.txt
as well).

Not sure if we want to add a flag for copy detection (similar to
git-diff's -C/--find-copies and --find-copies-harder), or just leave
that for when someone finds a need.  If left out, might want to just
mention that it was considered and intentionally omitted for now in
the commit message.

> @@ -1336,6 +1372,27 @@ int cmd_status(int argc, const char **argv, const char 
> *prefix)
> s.ignore_submodule_arg = ignore_submodule_arg;
> s.status_format = status_format;
> s.verbose = verbose;
> +   s.detect_rename = no_renames >= 0 ? !no_renames :
> + status_detect_rename >= 0 ? 
> status_detect_rename :
> + diff_detect_rename >= 0 ? 
> diff_detect_rename :

Combining the four vars into two as mentioned above would allow
combining the last two lines above into one.

> +   if ((intptr_t)rename_score_arg != -1) {

I don't understand why rename_score_arg is a (char*) and then you need
to cast to intptr_t, but that may just be because I haven't done much
of anything with option parsing.  A quick look at the docs isn't
making it clear to me, though; could you enlighten me?

> +   s.detect_rename = DIFF_DETECT_RENAME;

What if status.renames is 'copy' but someone wants to override the
score for detecting renames and pass --find-renames=40?  Does the
--find-renames 

Re: [PATCH v1] add status config and command line options for rename detection

2018-05-09 Thread Duy Nguyen
On Wed, May 9, 2018 at 4:42 PM, Ben Peart  wrote:
> Add a new config status.renames setting to enable turning off rename detection
> during status.  This setting will default to the value of diff.renames.

Please add the reason you need this config key in the commit message.
My guess (probably correct) is on super large repo (how large?),
rename detection is just too slow (how long?) that it practically
makes git-status unusable.

This information could be helpful when we optimize rename detection to
be more efficient.

>
> Add a new config status.renamelimit setting to to enable bounding the time 
> spent
> finding out inexact renames during status.  This setting will default to the
> value of diff.renamelimit.
>
> Add status --no-renames command line option that enables overriding the config
> setting from the command line. Add --find-renames[=] to enable detecting
> renames and optionaly setting the similarity index from the command line.
>
> Origional-Patch-by: Alejandro Pauly 
> Signed-off-by: Ben Peart 
-- 
Duy


[PATCH v1] add status config and command line options for rename detection

2018-05-09 Thread Ben Peart
Add a new config status.renames setting to enable turning off rename detection
during status.  This setting will default to the value of diff.renames.

Add a new config status.renamelimit setting to to enable bounding the time spent
finding out inexact renames during status.  This setting will default to the
value of diff.renamelimit.

Add status --no-renames command line option that enables overriding the config
setting from the command line. Add --find-renames[=] to enable detecting
renames and optionaly setting the similarity index from the command line.

Origional-Patch-by: Alejandro Pauly 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref:
Web-Diff: https://github.com/benpeart/git/commit/aa977d2964
Checkout: git fetch https://github.com/benpeart/git status-renames-v1 && 
git checkout aa977d2964

 Documentation/config.txt |  9 
 builtin/commit.c | 57 +
 diff.c   |  2 +-
 diff.h   |  1 +
 t/t7525-status-rename.sh | 90 
 wt-status.c  | 12 ++
 wt-status.h  |  4 +-
 7 files changed, 173 insertions(+), 2 deletions(-)
 create mode 100644 t/t7525-status-rename.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..b79b83c587 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3119,6 +3119,15 @@ status.displayCommentPrefix::
behavior of linkgit:git-status[1] in Git 1.8.4 and previous.
Defaults to false.
 
+status.renameLimit::
+   The number of files to consider when performing rename detection;
+   if not specified, defaults to the value of diff.renameLimit.
+
+status.renames::
+   Whether and how Git detects renames.  If set to "false",
+   rename detection is disabled. If set to "true", basic rename
+   detection is enabled.  Defaults to the value of diff.renames.
+
 status.showStash::
If set to true, linkgit:git-status[1] will display the number of
entries currently stashed away.
diff --git a/builtin/commit.c b/builtin/commit.c
index 5240f11225..a545096ddd 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -109,6 +109,10 @@ static int have_option_m;
 static struct strbuf message = STRBUF_INIT;
 
 static enum wt_status_format status_format = STATUS_FORMAT_UNSPECIFIED;
+static int diff_detect_rename = -1;
+static int status_detect_rename = -1;
+static int diff_rename_limit = -1;
+static int status_rename_limit = -1;
 
 static int opt_parse_porcelain(const struct option *opt, const char *arg, int 
unset)
 {
@@ -143,6 +147,16 @@ static int opt_parse_m(const struct option *opt, const 
char *arg, int unset)
return 0;
 }
 
+static int opt_parse_rename_score(const struct option *opt, const char *arg, 
int unset)
+{
+   const char **value = opt->value;
+   if (arg != NULL && *arg == '=')
+   arg = arg + 1;
+
+   *value = arg;
+   return 0;
+}
+
 static void determine_whence(struct wt_status *s)
 {
if (file_exists(git_path_merge_head()))
@@ -1259,11 +1273,29 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
return error(_("Invalid untracked files mode '%s'"), v);
return 0;
}
+   if (!strcmp(k, "diff.renamelimit")) {
+   diff_rename_limit = git_config_int(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "status.renamelimit")) {
+   status_rename_limit = git_config_int(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "diff.renames")) {
+   diff_detect_rename = git_config_rename(k, v);
+   return 0;
+   }
+   if (!strcmp(k, "status.renames")) {
+   status_detect_rename = git_config_rename(k, v);
+   return 0;
+   }
return git_diff_ui_config(k, v, NULL);
 }
 
 int cmd_status(int argc, const char **argv, const char *prefix)
 {
+   static int no_renames = -1;
+   static const char *rename_score_arg = (const char *)-1;
static struct wt_status s;
int fd;
struct object_id oid;
@@ -1297,6 +1329,10 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
  N_("ignore changes to submodules, optional when: all, dirty, 
untracked. (Default: all)"),
  PARSE_OPT_OPTARG, NULL, (intptr_t)"all" },
OPT_COLUMN(0, "column", , N_("list untracked files in 
columns")),
+   OPT_BOOL(0, "no-renames", _renames, N_("do not detect 
renames")),
+   { OPTION_CALLBACK, 'M', "find-renames", _score_arg,
+ N_("n"), N_("detect renames, optionally set similarity 
index"),
+ PARSE_OPT_OPTARG, opt_parse_rename_score },
OPT_END(),
};
 
@@ -1336,6 +1372,27 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)