Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-30 Thread Torsten Bögershausen
7/7 needs to be amended with something like this,
and the documentation needs an update.

In any case, the user can normalize the repo like this:
$ echo "* text=auto" >>.gitattributes
$ rm .git/index # Remove the index to force Git to
$ git reset # re-scan the working directory
$ git status# Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"

(or run "dos2unix" filename)




commit a604db36bb946000776514c220964f32979c8756
Author: Torsten Bögershausen 
Date:   Wed Mar 30 15:53:52 2016 +0200

convert.c: add warning when eol are wrong after checkout

When line endings are not normalized, they may be different after the
next checkout to what is configured.
Add a warning, similar to the CRLF-LF replacement, when a file is commited,
and the line endings are not converted at commit or checkout.

diff --git a/convert.c b/convert.c
index 8266d87..1fddbe8 100644
--- a/convert.c
+++ b/convert.c
@@ -18,6 +18,8 @@
 #define CONVERT_STAT_BITS_TXT_CRLF  0x2
 #define CONVERT_STAT_BITS_BIN   0x4
 
+#define CONVERT_STAT_BITS_MIXED (CONVERT_STAT_BITS_TXT_LF | 
CONVERT_STAT_BITS_TXT_CRLF)
+
 enum crlf_action {
CRLF_UNDEFINED,
CRLF_BINARY,
@@ -279,6 +281,8 @@ static void check_safe_crlf(const char *path, enum 
crlf_action crlf_action,
enum safe_crlf checksafe,
unsigned convert_stats, unsigned new_convert_stats)
 {
+   enum eol new_eol = output_eol(crlf_action);
+   const char *err_warn_msg = NULL;
if (!checksafe)
return;
if (convert_stats & CONVERT_STAT_BITS_TXT_CRLF &&
@@ -303,6 +307,19 @@ static void check_safe_crlf(const char *path, enum 
crlf_action crlf_action,
else /* i.e. SAFE_CRLF_FAIL */
die("LF would be replaced by CRLF in %s", path);
}
+   if ((new_convert_stats & CONVERT_STAT_BITS_MIXED) == 
CONVERT_STAT_BITS_MIXED)
+   err_warn_msg = "mixed eol";
+   else if (new_eol == EOL_LF && new_convert_stats & 
CONVERT_STAT_BITS_TXT_CRLF)
+   err_warn_msg = "CRLF";
+
+   if (err_warn_msg) {
+   if (checksafe == SAFE_CRLF_WARN)
+   warning("%s will be present after commit and checkout 
in %s.",
+   err_warn_msg, path);
+   else
+   die("%s will be present after commit and checkout in 
%s",
+   err_warn_msg, path);
+   }
 }
 
[snip changes in t0027] 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> Torsten Bögershausen  writes:
>>
>>> If we had made the CRLF -> LF conversion, yes. But we don't do that.
>>> crlf_to_git() returns without touching the line endings.
>>
>> That sounds quite broken.  How would a user ever fix broken data in
>> the index then?  I know the commit that often appears in these
>> discussions claims to give us "safer CRLF" handling, but I have a
>> suspicion that perhaps we should rethink if that claim is really
>> true.  Isn't it giving us more problems than it is worth?
>
> Having said all that, within the context of the current codebase
> where autocrlf refuses to do the conversion, I agree that teaching
> blame to follow the same logic makes sense.  Let me review the
> series up to 6/7 with fresh eyes.

And for the same reason 7/7 may make sense (I didn't check the
implementation, but I think I understand your motivation well enough
to comment)--if crlf-to-git returns without actually converting upon
next "git add $path", in order to serve a preview of what change you
would be checking in and/or committing when you do "git add $path"
and/or "git commit $path", the _to_git() conversion "git diff" and
"git diff HEAD" do on the contents taken from the working tree
should follow the same logic.

"git diff $tree-ish" (of which "git diff HEAD" is a special case) is
a bit tricky to reason about, but I think using the "does index have
CR to cause us refuse conversion?" logic is a sensible thing to do
even in that case.  It is asking what difference you would have if
you committed the state in the working tree right now, and the "does
index have CR?" logic to kick in, even though the contents of the
index may not be something that was derived from the unrelated
$tree-ish, would kick in when you make the hypothetical commit to be
compared against $tree-ish.

Again, let me review 7/7 as well with fresh eyes.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Torsten Bögershausen  writes:
>
>> If we had made the CRLF -> LF conversion, yes. But we don't do that.
>> crlf_to_git() returns without touching the line endings.
>
> That sounds quite broken.  How would a user ever fix broken data in
> the index then?  I know the commit that often appears in these
> discussions claims to give us "safer CRLF" handling, but I have a
> suspicion that perhaps we should rethink if that claim is really
> true.  Isn't it giving us more problems than it is worth?

Having said all that, within the context of the current codebase
where autocrlf refuses to do the conversion, I agree that teaching
blame to follow the same logic makes sense.  Let me review the
series up to 6/7 with fresh eyes.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Junio C Hamano
Torsten Bögershausen  writes:

> If we had made the CRLF -> LF conversion, yes. But we don't do that.
> crlf_to_git() returns without touching the line endings.

That sounds quite broken.  How would a user ever fix broken data in
the index then?  I know the commit that often appears in these
discussions claims to give us "safer CRLF" handling, but I have a
suspicion that perhaps we should rethink if that claim is really
true.  Isn't it giving us more problems than it is worth?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Junio C Hamano
Torsten Bögershausen  writes:

> # Here the lines are not going to be normalized at the next commit.
> # They stay CRLF.

Isn't that the real source of the problem?  Why don't we fix that
then?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Torsten Bögershausen
On 29.03.16 19:21, Junio C Hamano wrote:
> tbo...@web.de writes:
> 
>> From: Torsten Bögershausen 
>>
>> git blame reports lines as not "Not Committed Yet" when they have
>> CRLF in the index, CRLF in the worktree and e.g. core.autocrlf is true.
>>
>> Since commit c48053 "new safer autocrlf handling", files that have CRLF
>> in the index are not normalized at commit when e.g. core.autocrl is set.
>>
>> Whenever a CLRF conversion is needed (or any filter us set), load the
>> index temporally, before calling convert_to_git()
> 
> Sorry, but I do not understand what problem you are trying to
> address here.
> 
> Under the same condition described in the first paragraph, what
> would "git diff" and "git diff HEAD" say?  They should show that you
> would be making a commit that corrects the line ending of the blob
> recorded in the history.
> 
Let's make an experiment, Git v2.8.0


$ printf "Line1\r\nLine2\r\n" >test_CRLF.txt
$ git add test_CRLF.txt 
$ git commit -m "add test_CRLF.txt"
 [detached HEAD 719c166] add test_CRLF.txt
 1 file changed, 2 insertions(+)
 create mode 100644 test_CRLF.txt

$ git ls-files --eol test_CRLF.txt 
i/crlf  w/crlf  attr/   test_CRLF.txt
# So far, so good.

git config core.autocrlf true

# Now lets patch Git to debug the safer CRLF handling
diff --git a/convert.c b/convert.c
index f524b8d..fcf7653 100644
--- a/convert.c
+++ b/convert.c
@@ -260,8 +260,11 @@ static int crlf_to_git(const char *path, const char *src, 
size_t len,
 * If the file in the index has any CR in it, do not 
convert.
 * This is the new safer autocrlf handling.
 */
-   if (has_cr_in_index(path))
+   if (has_cr_in_index(path)) {
+   fprintf(stderr, "%s/%s:%d 
has_cr_in_index(%s)\n",
+   __FILE__, __FUNCTION__, __LINE__, path);
return 0;
+   }

# Of course, run make
$ make
#
printf "Line3\r\n" >>test_CRLF.txt

# Lets see what diff says:
./git diff test_CRLF.txt | od -c
convert.c/crlf_to_git:265 has_cr_in_index(test_CRLF.txt)
convert.c/crlf_to_git:265 has_cr_in_index(test_CRLF.txt)
000d   i   f   f   -   -   g   i   t   a   /   t   e   s
020t   _   C   R   L   F   .   t   x   t   b   /   t   e   s
040t   _   C   R   L   F   .   t   x   t  \n   i   n   d   e   x
0604   a   a   5   5   1   d   .   .   d   0   f   a   f   1
100d   1   0   0   6   4   4  \n   -   -   -   a   /   t
120e   s   t   _   C   R   L   F   .   t   x   t  \n   +   +   +
140b   /   t   e   s   t   _   C   R   L   F   .   t   x   t
160   \n   @   @   -   1   ,   2   +   1   ,   3   @   @
200   \n   L   i   n   e   1  \r  \n   L   i   n   e   2  \r
220   \n   +   l   i   n   e   3  \r  \n
231
# Here the lines are not going to be normalized at the next commit.
# They stay CRLF.
# But git blame doesn't know that, because has_cr_in_index doesn't work
# without an index.

$ ./git blame test_CRLF.txt 
 (Not Committed Yet 2016-03-29 21:44:48 +0200 1) Line1
 (Not Committed Yet 2016-03-29 21:44:48 +0200 2) Line2
 (Not Committed Yet 2016-03-29 21:44:48 +0200 3) line3



$ git commit -m "Add line3" test_CRLF.txt

> The "Not Committed Yet" output from "git blame" is the same thing.
> It is telling you that the commit you would be making by adding
> that path from the working tree in its current state will become
> the one that is responsible for the final state of the line.
> 
> So it is absolutely the right thing that these lines are shown as
> "Not Commited Yet".  You will be making the line-ending correction
> for the entire blob, and you should be made aware of it.
If we had made the CRLF -> LF conversion, yes. But we don't do that.
crlf_to_git() returns without touching the line endings.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 6/7] correct blame for files commited with CRLF

2016-03-29 Thread Junio C Hamano
tbo...@web.de writes:

> From: Torsten Bögershausen 
>
> git blame reports lines as not "Not Committed Yet" when they have
> CRLF in the index, CRLF in the worktree and e.g. core.autocrlf is true.
>
> Since commit c48053 "new safer autocrlf handling", files that have CRLF
> in the index are not normalized at commit when e.g. core.autocrl is set.
>
> Whenever a CLRF conversion is needed (or any filter us set), load the
> index temporally, before calling convert_to_git()

Sorry, but I do not understand what problem you are trying to
address here.

Under the same condition described in the first paragraph, what
would "git diff" and "git diff HEAD" say?  They should show that you
would be making a commit that corrects the line ending of the blob
recorded in the history.

The "Not Committed Yet" output from "git blame" is the same thing.
It is telling you that the commit you would be making by adding
that path from the working tree in its current state will become
the one that is responsible for the final state of the line.

So it is absolutely the right thing that these lines are shown as
"Not Commited Yet".  You will be making the line-ending correction
for the entire blob, and you should be made aware of it.

Now, it would be a very good idea for such a correction change to
have no other change, and it would be a very good idea to give users
a tool to see if there is any change other than the eol correction.
There is "git blame -w" to ignore whitespace changes, and I think
that would hide the CRLF/LF correction, but that hides all other kinds
of whitespace changes, so it is too broad for the purpose.

I do not think I'd be opposed to a new option to allow the command
to ignore _only_ eol changes (this applies not just to "blame" but
also to "diff", too).  That way:

 * The users can more easily make sure that the set of changes being
   prepared does not do anything other than correcting eol;

 * "git blame" can be told to relieve a commit of the responsibility
   for lines if the only change it did to them is to correct eol
   when digging the history (i.e. this will not just help "Not
   Committed Yet" changes in the working tree, but for digging
   through historical events).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html