Re: [RFC PATCH] Introduce "precious" file concept

2018-11-26 Thread Eckhard Maaß
On Mon, Nov 12, 2018 at 11:22:09PM +, brian m. carlson wrote:
> This is going to totally hose automation.  My last job had files which
> might move from tracked to untracked (a file that had become generated),
> and long-running CI and build systems would need to be able to check out
> one status and switch to the other.  Your proposed change will prevent
> those systems from working, whereas they previously did.

Wouldn't those systems not use -f right now? And shouldn't Git have the
same semantic for -f to clobber everything in the proposed use case?
Like it does right now for untracked files which are not ignored. So to
be save going back and forth I would expect those systems to use -f
anyway. Have I missed something here?

Regards,
Eckhard


Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content

2018-09-18 Thread Eckhard Maaß
On Tue, Sep 18, 2018 at 12:41:30PM -0700, Jacob Keller wrote:
> I think having both is good. There are a lot of ways to accidentally
> throw away work, and it's pretty frustrating to have it happen. But
> the reflog is also somewhat complicated, and I've definitely seen a
> lot of developers who've never heard of it, and struggle with the
> concept.

It's definitely good to improve on "oh, I screwed up - how can I
recover?" part. 

> I personally think having the nice "it looks like you're about to
> throw away all your changes, are you sure" style of protection using
> something like --clobber-index is useful as a mode, even if we have an
> index log of sorts.

I don't think it's an appealing design. If we have the index reflog
giving us an undo function, then it is (at least for me) irritating to
have additional protection. Furthermore, this protection, to not break
existing workflows, needs to be configurable. This comes with much
baggage on its own. Having then two things which seem to solve the same
problem setting, but somehow different, is in my opinion even more
confusing in the long run.

Greetings,
Eckhard


Re: [Possible GIT Bug]

2018-09-10 Thread Eckhard Maaß
On Mon, Sep 10, 2018 at 09:03:10AM -0700, Junio C Hamano wrote:
> It is neither but if I have to pick one between the two, it is much
> closer to the former than the latter.  The primary source of this is
> that we have only *one* pathspec given to the diff machinery, but in
> order to implement your ideal "find harder", you'd need *two*.  That
> is, one set of paths for which you are interested in their origin,
> and the other set that you allow the machinery to consider as possible
> origins.  Since we can only give one pathspec machinery, that one
> pathspec is used to specify both of these sets.

How does tihs compare to `--follow`? With that knob active the machinery
indeed uses the whole repository for finding renames and/or copies. Is
this the only exception then?

Take care,
Eckhard


Re: How to delete files and directories from git commit history?

2018-06-12 Thread Eckhard Maaß
On Tue, Jun 12, 2018 at 03:44:13PM -0400, Steve Litt wrote:
> git filter-branch --tree-filter 'rm -rf docs/propdir' HEAD
...
> What command do I do to remove all mention of doc/propdir and its
> files from my git history?

Are you sure that you pruned all branches? I would have expected a
command like "git filter-branch ... -- --all" to prune every branch from
the directories.

Greetings,
Eckhard


Re: [PATCH v3] add status config and command line options for rename detection

2018-05-12 Thread Eckhard Maaß
On Fri, May 11, 2018 at 12:56:39PM +, Ben Peart wrote:
> After performing a merge that has conflicts git status will, by default,
> attempt to detect renames which causes many objects to be examined.  In a
> virtualized repo, those objects do not exist locally so the rename logic
> triggers them to be fetched from the server. This results in the status call
> taking hours to complete on very large repos vs seconds with this patch.

I see where your need comes from, but as you based this on my little
patch one can achieve this already with tweaking diff.renames itself. I
do wonder why there is a special need for the status command here. And
if there is, I personally would like it more in a style that you could
take all the options provided by diff.*-configuration and prefix that
with status, eg status.diff.renames = true. What do you think? If you
really only need this for merges, maybe a more specialised option is
called for that only kicks in when there is a merge going on?

I would like that status behaves as similar as possible to
diff/show/log. Special options will pull away from that again - passing
-m to show or log will lead to the same performance issues, correct?
Could it be feasible to impose an overall time limit on the detection?

And after writing this I wonder what were your experience with just
tweaking renameLimit - setting it very low should have helped the
fetching from server part already, shouldn't it?

> Add --no-renames command line option to status that enables overriding the
> config setting from the command line. Add --find-renames[=] command line
> option to status that enables detecting renames and optionally setting the
> similarity index.

Would it be reasonable to extend this so that we just use the same
machinery for parsing command line options for the diffcore options and
pass this along? It seems to me that git status wants the same init as
diff/show/log has anyway. But I like the direction towards passing more
command line options to the git status command. 

>  static void wt_longstatus_print_unmerged_header(struct wt_status *s)
> @@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct 
> wt_status *s)
>   }
>   rev.diffopt.format_callback = wt_status_collect_changed_cb;
>   rev.diffopt.format_callback_data = s;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   copy_pathspec(_data, >pathspec);
>   run_diff_files(, 0);
>  }
> @@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct 
> wt_status *s)
>   rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
>   rev.diffopt.format_callback = wt_status_collect_updated_cb;
>   rev.diffopt.format_callback_data = s;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   copy_pathspec(_data, >pathspec);
>   run_diff_index(, 1);
>  }
> @@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status 
> *s)
>   setup_revisions(0, NULL, , );
>  
>   rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
> + rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : 
> rev.diffopt.detect_rename;
> + rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : 
> rev.diffopt.rename_limit;
> + rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : 
> rev.diffopt.rename_score;
>   rev.diffopt.file = s->fp;
>   rev.diffopt.close_file = 0;
>   /*

Somehow I am inclined that those should be factored out to a common
method if the rest of the patch stays as it is.

Greetings,
Eckhard


Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults

2018-05-02 Thread Eckhard Maaß
On Tue, May 01, 2018 at 05:08:27PM -0700, Elijah Newren wrote:
> Eckhard, can you add some comments to your commit message mentioning
> the email pointed to by Junio about break detection and rename
> detection being unsafe to use together, as well as the inconsistencies
> in break detection between different commands?

I will work on that.

Greetings,
Eckhard


Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults

2018-05-01 Thread Eckhard Maaß
On Tue, May 01, 2018 at 02:23:51PM +0200, Matthieu Moy wrote:
> I'm fine with it as-is. Before your "fix", the config was ignored
> because overwritten by init_diff_ui_defaults() after reading the
> config, so effect of your change is indeed what the commit message
> describes.
> 
> I'm often thinking aloud while reviewing, don't take my comments as
> objections.

No worries, I was wondering while writing the patch to extract it - the
init should be changed to the appropriate location even if there is
consensus to leave all the other knobs as they are, shouldn't it?

Greetings,
Eckhard


Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults

2018-05-01 Thread Eckhard Maaß
On Tue, May 01, 2018 at 01:09:06PM +0200, Matthieu Moy wrote:
> That init_diff_ui_defaults() should indeed have been before
> git_config() from the beginning. My bad, I'm the one who
> misplaced it apparently :-(.

Should I have done this "bug fix" in a separate commit or mention it in
the commit message?

> This "break_opt = 0" deserves a mention in the commit message IMHO.
> I'm not 100% sure it's a good change actually.

Hm, what problems do you see here? The purpose of my patch is have the
same kind of output from git status and, after commit, git show. I
cannot find a good reason for git status to behave there differently,
but I am interested to see where I am wrong.

On the other hand, one could a configuration option to the diff.* family
for controlling that toggle also.

Greetings,
Eckhard


Re: [PATCH v2] wt-status: use rename settings from init_diff_ui_defaults

2018-05-01 Thread Eckhard Maaß
On Tue, May 01, 2018 at 01:00:54PM +0200, Ævar Arnfjörð Bjarmason wrote:
> So didn't we use diff heuristics to begin with, and then regressed? I've
> only given this a skimming, but it's useful to have that sort of
> historical context mentioned explicitly with commit ids.

Sorry for not making this too explicit: I traced wt-status.c to its
beginning in c91f0d92ef ("git-commit.sh: convert run_status to a C
builtin", 2006-09-08). Here I lost track of other changes - but the
commit you gave is earlier also has rename detection hard coded in the
status command. Should I add this as a starting point instead of the
commit mentioned so far? And this also seems like the very beginning of
git status.

The point I wanted to make, is that git show showed you renaming of
files out of the box whereas other commands did not. At least I remember
this form 5+ years ago.

This changed with 5404c116aa, so that the out of the box behaviour is
the same, but this is more a coincidence that the hard coded flag is the
same as the default configuration value.

My comment was targeted at the hard coded rename detection flag - the
other two just have seem to pile up on that and I wanted to clean them
up, too. Maybe a phrasing like "While at it, also remove the two other
hard coded values concerning rename detection in git status." is better?
Is my intent clearer now?

Greetings,
Eckhard


Re: [PATCH v3 2/3] merge: Add merge.renames config setting

2018-04-30 Thread Eckhard Maaß
On Fri, Apr 27, 2018 at 01:23:20PM -0700, Elijah Newren wrote:
> I doubt it has ever been discussed before this thread.  But, if you're
> curious, I'll try to dump a few thoughts.

Thank you, I try to dump some of mine, too. Maybe let me first stress
that for me copy detection without --find-copies-harder is much more a
"find content extracted" (like methods being factored out). In a way
this is nearer to a rename than to a real copy.


> [...] Let's say we have branches
> A and B, and:
>A: modifies file z
>B: copies z to y
> 
> Should the modifications to z done in A propagate to both z and y?  If
> not, what good is copy detection?  If so, then there are several
> ramifications...

If one just assumes the most likely outcome is that something from z wad
factored out to y, it might just be sufficient to see whether the
modifications of the two branches apply cleanly - if A touched the parts
of B that have been factored out there would be a normal merge conflict
(where one could be nice and give a hint that some content was copied to
y on the B branch), if A did not touched the parts touched (or moved) by
B, then there is no problem. If A exactly deleted the content moved by
B, there will be no conflict - but this is seems to be strange anyway.

I admit that a "real" copy would get unnoticed that way. But the
semantics of such a copy isn't too clear for me either - did I copy the
other part to make it independent of the other or did I just employ a
copy and paste tactic? The former does not want the changes, the later
does. But I am happy catering to the former here.

To sum up:
- fail as before for conflicting merges, but give a hint that one has
  copied to quicken up resolution.


> - If B not only copied z but also first modified it, then do we have
>   potential conflicts with both z and y -- possibly the exact same
>   conflicts, making the user resolve them repeatedly?

With the above suggestion, if there are conflicts, you fail and give a
hint.

> - What if A copied z to x?  Do changes to z propagate to all three of
>   z and x and y?  Do changes to either x or y affect z?  Do they
>   affect each other?

A copy on branch to x and one another to y seems strange even if z
merges cleanly. Did both sides try to factor the same thing out to
different files? Or did they try to make something independent, but
managed to make it to different files? For this I would be inclined to
just suggest fail with a copy/copy(somewhere else). But this is a real
corner case after all. Has anyone seen just thing in practice?

> - If A deleted z, does that give us a copy/delete conflict for y?  Do
>   we also have to worry about copy/add conflicts?  copy/add/delete?
>   rename/copy (multiple variants)?  copy/copy?

We do have the modified/deleted conflict where we could hint that
content also has been copied and then not try to do more.

> - Extra degrees of freedom may mean new conflict types:
> 
>   - The extra degrees of freedom from renames introduced multiple new
> conflict types (e.g. rename/add, rename/rename(1to2),
> rename/rename(2to1)).

For renaming one side and coping the other, I would think doing the same
as above is sensible enough: if there are conflicts one can give an
additional hint of the one part having been copied, but not change the
kind of conflicts much.

> The more I think about it, the more I think that attempting to detect
> copies in a merge algorithm just doesn't make sense.  Anything I can
> think of that someone might attempt to use detected copies for would
> just surprise users in a bad way...

Hm, it didn't sound like that. Would you think that users would be
surprised by my suggestions? Or are they all too corner casey to be
worth implementing anyway?

Greetings,
Eckhard


Re: [PATCH v3 2/3] merge: Add merge.renames config setting

2018-04-27 Thread Eckhard Maaß
On Fri, Apr 27, 2018 at 11:23:56AM +0900, Junio C Hamano wrote:
> I think demoting from copy to rename-only is a good idea, at least
> for now, because I do not believe we have figured out what we want
> to happen when we detect copied files are involved in a merge.

Does anyone know some threads concerning that topic? I tried to search
for it, but somehow "merge copy detection" did not find me useful
threads so far. I would be interested in that topic anyway, so I would
like to know what the ideas are that floated so far.

Greetings,
Eckhard


Re: [PATCH v1 1/2] merge: Add merge.renames config setting

2018-04-23 Thread Eckhard Maaß
On Mon, Apr 23, 2018 at 09:15:09AM -0400, Ben Peart wrote:
> In commit 2a2ac926547 when merge.renamelimit was added, it was decided to
> have separate settings for merge and diff to give users the ability to
> control that behavior.  In this particular case, it will default to the
> value of diff.renamelimit when it isn't set.  That isn't consistent with the
> other merge settings.

However, it seems like a desirable way to do it.

Maybe let me throw in some code for discussion (test and documentation
is missing, mainly to form an idea what the change in options should
be). I admit the patch below is concerned only with diff.renames, but
whatever we come up with for merge should be reflected there, too,
doesn't it?

Greetings,
Eckhard

-- >8 --

>From e8a88111f2aaf338a4c19e83251c7178f7152129 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Eckhard=20S=2E=20Maa=C3=9F?= 
Date: Sun, 22 Apr 2018 23:29:08 +0200
Subject: [PATCH] diff: enhance diff.renames to be able to set rename score
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Eckhard S. Maaß 
---
 diff.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f..a3cedad5cf 100644
--- a/diff.c
+++ b/diff.c
@@ -30,6 +30,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_rename_score_default;
 static int diff_indent_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
@@ -177,13 +178,33 @@ static int parse_submodule_params(struct diff_options 
*options, const char *valu
return 0;
 }
 
+int parse_rename_score(const char **cp_p);
+
+static int git_config_rename_score(const char *value)
+{
+   int parsed_rename_score = parse_rename_score();
+   if (parsed_rename_score == -1)
+   return error("invalid argument to diff.renamescore: %s", value);
+   diff_rename_score_default = parsed_rename_score;
+   return 0;
+}
+
 static int git_config_rename(const char *var, const char *value)
 {
-   if (!value)
-   return DIFF_DETECT_RENAME;
-   if (!strcasecmp(value, "copies") || !strcasecmp(value, "copy"))
-   return  DIFF_DETECT_COPY;
-   return git_config_bool(var,value) ? DIFF_DETECT_RENAME : 0;
+   if (!value) {
+   diff_detect_rename_default = DIFF_DETECT_RENAME;
+   return 0;
+   }
+   if (skip_to_optional_arg(value, "copies", ) || 
skip_to_optional_arg(value, "copy", )) {
+   diff_detect_rename_default = DIFF_DETECT_COPY;
+   return git_config_rename_score(value);
+   }
+   if (skip_to_optional_arg(value, "renames", ) || 
skip_to_optional_arg(value, "rename", )) {
+   diff_detect_rename_default = DIFF_DETECT_RENAME;
+   return git_config_rename_score(value);
+   }
+   diff_detect_rename_default = git_config_bool(var,value) ? 
DIFF_DETECT_RENAME : 0;
+   return 0;
 }
 
 long parse_algorithm_value(const char *value)
@@ -307,8 +328,7 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
if (!strcmp(var, "diff.renames")) {
-   diff_detect_rename_default = git_config_rename(var, value);
-   return 0;
+   return git_config_rename(var, value);
}
if (!strcmp(var, "diff.autorefreshindex")) {
diff_auto_refresh_index = git_config_bool(var, value);
@@ -4116,6 +4136,7 @@ void diff_setup(struct diff_options *options)
options->add_remove = diff_addremove;
options->use_color = diff_use_color_default;
options->detect_rename = diff_detect_rename_default;
+   options->rename_score = diff_rename_score_default;
options->xdl_opts |= diff_algorithm;
if (diff_indent_heuristic)
DIFF_XDL_SET(options, INDENT_HEURISTIC);
-- 
2.17.0.252.gfe0a9eaf31



Re: [PATCH v1 1/2] merge: Add merge.renames config setting

2018-04-22 Thread Eckhard Maaß
On Fri, Apr 20, 2018 at 11:34:25AM -0700, Elijah Newren wrote:
> Sorry, I think I wasn't being clear.  The documentation for the config
> options for e.g. diff.renameLimit, fetch.prune, log.abbrevCommit, and
> merge.ff all mention the equivalent command line parameters.  Your
> patch doesn't do that for merge.renames, but I think it would be
> helpful if it did.

I wonder here what the relation to the diff.* options should be in this
regard anyway. There is already diff.renames. Naively, I would assume
that these options are in sync, that is you control the behavior of both
the normal diff family like git show and git merge. The reasoning, at
least for me, is to keep consistency between the outcome of rename
detection while merging and a later simple "git show MERGE_BASE..HEAD".
I would expect those to give me the same style of rename detection.

Hence, I would like to use diff.renames and maybe enhance this option to
also carry the score in backward compatible way (or introduce a second
configuration option?). Is this idea going in a good direction? If yes,
I will try to submit a patch for this.

Ah, by the way: for people that have not touched diff.renames there will
be no visible change in how Git behaves - the default for diff.renames
is a rename with 50% score with is the same for merge. So it will only
change if one has tweaked diff.renames already. But I wonder if one does
that and expect the merge to use a different rename detection anyway.

Greetings,
Eckhard