Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Torsten Bögershausenwrites: >>> enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL >>> ? SAFE_CRLF_WARN >>> : safe_crlf); >>> + if (size_only) >>> + crlf_warn = SAFE_CRLF_FALSE; >> >> If you were to go this route, it may be sufficient to change its >> initialization from WARN to FALSE _unconditionally_, because this >> function uses the convert_to_git() only to _show_ the differences by >> computing canonical form out of working tree contents, and the >> conversion is not done to _write_ into object database to create a >> new object. > > Hm, since when (is it not used) ? Since forever, but my statement above said "this function", which may have confused you, where it could have said diff_populate_filespec(). Surely it is possible for somebody to diff_populate_filespec(s, 0) and then call hash_sha1_file(s->data, s->size, "blob", ...) to write the data into the object database to create a new object. But that sounds really crazy, no? > The SAFE_CRLF_FAIL was converted into WARN here: > commit 5430bb283b478991a979437a79e10dcbb6f20e28 > Author: Junio C Hamano > Date: Mon Jun 24 14:35:04 2013 -0700 > > diff: demote core.safecrlf=true to core.safecrlf=warn Yes. > Does this all means that, looking back, 5430bb283b478991 could have been more > aggressive and could have used SAFE_CRLF_FALSE ? That is pretty much the statement, to which you said "since when", suspects. > And we can do this change now? I am not sure. The conversion the safe-crlf code does is unsafe and it is a disservice to users not to warn whenever we notice they are risking information loss. Maybe time they run "git diff" is not a good time to warn, as they may not be actually adding the file as-is, but if warning against information loss at "git diff" time is important enough, the I think that should not be squelched by the "--quiet" option, which is about "do not show the patch text output". It should not be taken as "do not diagnose any errors".
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On 2017-03-03 18:47, Junio C Hamano wrote: > Torsten Bögershausenwrites: > >> Understood, thanks for the explanation. >> >> quiet is not quite any more.. >> >> Does the following fix help ? >> >> --- a/diff.c >> +++ b/diff.c >> @@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s, >> unsigned int flags) >> enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL >> ? SAFE_CRLF_WARN >> : safe_crlf); >> + if (size_only) >> + crlf_warn = SAFE_CRLF_FALSE; > > If you were to go this route, it may be sufficient to change its > initialization from WARN to FALSE _unconditionally_, because this > function uses the convert_to_git() only to _show_ the differences by > computing canonical form out of working tree contents, and the > conversion is not done to _write_ into object database to create a > new object. Hm, since when (is it not used) ? I thought that it is needed to support the safecrlf handling introduced in 21e5ad50fc5e7277c74cfbb3cf6502468e840f86 Author: Steffen Prohaska Date: Wed Feb 6 12:25:58 2008 +0100 safecrlf: Add mechanism to warn about irreversible crlf conversions - The SAFE_CRLF_FAIL was converted into WARN here: commit 5430bb283b478991a979437a79e10dcbb6f20e28 Author: Junio C Hamano Date: Mon Jun 24 14:35:04 2013 -0700 diff: demote core.safecrlf=true to core.safecrlf=warn Otherwise the user will not be able to start to guess where in the contents in the working tree the offending unsafe CR lies. My understanding is that we don't want to break the safecrlf feature, but after applying diff --git a/diff.c b/diff.c index a628ac3a95..a05d88dd9f 100644 --- a/diff.c +++ b/diff.c @@ -2820,12 +2820,10 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) int size_only = flags & CHECK_SIZE_ONLY; int err = 0; /* -* demote FAIL to WARN to allow inspecting the situation -* instead of refusing. +* Don't use FAIL or WARN as this code is not called when _writing_ +* into object database to create a new object. */ - enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL - ? SAFE_CRLF_WARN - : safe_crlf); + enum safe_crlf crlf_warn = SAFE_CRLF_FALSE; None of the test cases in t0020--t0027 fails or complain about missing warnings. Does this all means that, looking back, 5430bb283b478991 could have been more aggressive and could have used SAFE_CRLF_FALSE ? And we can do this change now? (If the answer is yes, we don't need to deal with the problem below) > Having size_only here is not a sign of getting --quiet passed from > the command line, by the way. >
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Torsten Bögershausenwrites: > Understood, thanks for the explanation. > > quiet is not quite any more.. > > Does the following fix help ? > > --- a/diff.c > +++ b/diff.c > @@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s, > unsigned int flags) > enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL > ? SAFE_CRLF_WARN > : safe_crlf); > + if (size_only) > + crlf_warn = SAFE_CRLF_FALSE; If you were to go this route, it may be sufficient to change its initialization from WARN to FALSE _unconditionally_, because this function uses the convert_to_git() only to _show_ the differences by computing canonical form out of working tree contents, and the conversion is not done to _write_ into object database to create a new object. Having size_only here is not a sign of getting --quiet passed from the command line, by the way.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Understood, thanks for the explanation. quiet is not quite any more.. Does the following fix help ? --- a/diff.c +++ b/diff.c @@ -2826,6 +2826,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) enum safe_crlf crlf_warn = (safe_crlf == SAFE_CRLF_FAIL ? SAFE_CRLF_WARN : safe_crlf); + if (size_only) + crlf_warn = SAFE_CRLF_FALSE;
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Hi Torsten, Your patch has been superseded, but I thought I ought to answer your questions rather than leave them hanging. On Thursday 02 March 2017 at 19:17:00 +0100, Torsten Bögershausen wrote: > On 2017-03-01 22:25, Mike Crowe wrote: > > On Wednesday 01 March 2017 at 18:04:44 +0100, tbo...@web.de wrote: > >> From: Junio C Hamano> >> > >> git diff --quiet may take a short-cut to see if a file is changed > >> in the working tree: > >> Whenever the file size differs from what is recorded in the index, > >> the file is assumed to be changed and git diff --quiet returns > >> exit with code 1 > >> > >> This shortcut must be suppressed whenever the line endings are converted > >> or a filter is in use. > >> The attributes say "* text=auto" and a file has > >> "Hello\nWorld\n" in the index with a length of 12. > >> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. > >> (Or even "Hello\r\nWorld\n"). > >> In this case "git add" will not do any changes to the index, and > >> "git diff -quiet" should exit 0. > >> > >> Add calls to would_convert_to_git() before blindly saying that a different > >> size means different content. > >> > >> Reported-By: Mike Crowe > >> Signed-off-by: Torsten Bögershausen > >> --- > >> This is what I can come up with, collecting all the loose ends. > >> I'm not sure if Mike wan't to have the Reported-By with a > >> Signed-off-by ? > >> The other question is, if the commit message summarizes the discussion > >> well enough ? > >> > >> diff.c| 18 ++ > >> t/t0028-diff-converted.sh | 27 +++ > >> 2 files changed, 41 insertions(+), 4 deletions(-) > >> create mode 100755 t/t0028-diff-converted.sh > >> > >> diff --git a/diff.c b/diff.c > >> index 051761b..c264758 100644 > >> --- a/diff.c > >> +++ b/diff.c > >> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct > >> diff_filepair *p) > >> *differences. > >> * > >> * 2. At this point, the file is known to be modified, > >> - *with the same mode and size, and the object > >> - *name of one side is unknown. Need to inspect > >> - *the identical contents. > >> + *with the same mode and size, the object > >> + *name of one side is unknown, or size comparison > >> + *cannot be depended upon. Need to inspect the > >> + *contents. > >> */ > >>if (!DIFF_FILE_VALID(p->one) || /* (1) */ > >>!DIFF_FILE_VALID(p->two) || > >> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct > >> diff_filepair *p) > >>(p->one->mode != p->two->mode) || > >>diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > >>diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > >> - (p->one->size != p->two->size) || > >> + > >> + /* > >> + * only if eol and other conversions are not involved, > >> + * we can say that two contents of different sizes > >> + * cannot be the same without checking their contents. > >> + */ > >> + (!would_convert_to_git(p->one->path) && > >> + !would_convert_to_git(p->two->path) && > >> + (p->one->size != p->two->size)) || > >> + > >>!diff_filespec_is_identical(p->one, p->two)) /* (2) */ > >>p->skip_stat_unmatch_result = 1; > >>return p->skip_stat_unmatch_result; > >> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh > >> new file mode 100755 > >> index 000..3d5ab95 > >> --- /dev/null > >> +++ b/t/t0028-diff-converted.sh > >> @@ -0,0 +1,27 @@ > >> +#!/bin/sh > >> +# > >> +# Copyright (c) 2017 Mike Crowe > >> +# > >> +# These tests ensure that files changing line endings in the presence > >> +# of .gitattributes to indicate that line endings should be ignored > >> +# don't cause 'git diff' or 'git diff --quiet' to think that they have > >> +# been changed. > >> + > >> +test_description='git diff with files that require CRLF conversion' > >> + > >> +. ./test-lib.sh > >> + > >> +test_expect_success setup ' > >> + echo "* text=auto" >.gitattributes && > >> + printf "Hello\r\nWorld\r\n" >crlf.txt && > >> + git add .gitattributes crlf.txt && > >> + git commit -m "initial" > >> +' > >> + > >> +test_expect_success 'quiet diff works on file with line-ending change > >> that has no effect on repository' ' > >> + printf "Hello\r\nWorld\n" >crlf.txt && > >> + git status && > >> + git diff --quiet > >> +' > >> + > >> +test_done > > [snip] > > Also, I think I've found a behaviour change with this fix. Consider: > > > > echo "* text=auto" >.gitattributes > > printf "Hello\r\nWorld\r\n" >crlf.txt > That should give > "Hello\nWorld\n" in the index: > > git add .gitattributes crlf.txt > warning: CRLF will be replaced by LF in ttt/crlf.txt. > The file will have its original line endings in your working directory. > tb@mac:/tmp/ttt> git commit -m "initial" > [master (root-commit) 354f657]
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On Thursday 02 March 2017 at 10:33:59 -0800, Junio C Hamano wrote: > Mike Crowewrites: > > > All the solutions presented so far do cause a small change in behaviour > > when using git diff --quiet: they may now cause warning messages like: > > > > warning: CRLF will be replaced by LF in crlf.txt. > > The file will have its original line endings in your working directory. > > That is actually a good thing, I think. As the test modifies a file > that originally has "Hello\r\nWorld\r\n" in it to this: > > >> +test_expect_success 'quiet diff works on file with line-ending change > >> that has no effect on repository' ' > >> + printf "Hello\r\nWorld\n" >crlf.txt && > > If you did "git add" at this point, you would get the same warning, > because the lack of CR on the second line could well be a mistake > you may want to notice and fix before going forward. Otherwise > you'd be losing information that _might_ matter to you (i.e. the > fact that the first line had CRLF while the second had LF) and it is > the whole point of safe_crlf setting. Well, there is an argument that it's not very "--quiet" to emit this message. Especially when it didn't used to come out. However, I can understand that the message is useful if the line endings have changed despite this. However, I can make the message appear from "git diff --quiet" even if the line endings have not changed. I merely need to touch a file where the line endings do not match the canonical representation in the repository first. Upon subsequent invocations of "git diff --quiet" the message does not come out. (Note that in this may not be reproducible in a script without sleeps.) Perhaps this interactive log will make things clearer: $ git init Initialized empty Git repository in /tmp/test/.git/ $ echo "* text=auto" >.gitattributes $ printf "Hello\r\nWorld\r\n" >crlf.txt $ git add .gitattributes crlf.txt warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. The message was expected and useful there. $ git commit -m "initial" [master (root-commit) c3fb5a5] initial 2 files changed, 3 insertions(+) create mode 100644 .gitattributes create mode 100644 crlf.txt $ git diff --quiet $ touch crlf.txt $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. I didn't change the file - I just touched it. Why did the message come out here? $ git diff --quiet But then it didn't here. Which is correct? $ printf "Hello\r\nWorld\n" >crlf.txt $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. If the line endings have genuinely changed then the message comes out every time... $ git add crlf.txt warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. ...until the file is added to the index. That's probably the right thing to do. $ git diff --quiet $ touch crlf.txt $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. $ git diff --quiet But once the file has been added the previous behaviour of only emitting the message on the first time after the touch occurs. $ printf "Hello\r\nWorld\n" >crlf.txt $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. $ git diff --quiet $ printf "Hello\r\nWorld\r\n" >crlf.txt $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. $ git diff --quiet warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. Hopefully that makes things a bit clearer. Mike.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On 2017-03-02 15:20, Mike Crowe wrote: > ll the solutions presented so far do cause a small change in behaviour > when using git diff --quiet: they may now cause warning messages like: > > warning: CRLF will be replaced by LF in crlf.txt. > The file will have its original line endings in your working directory. Ah, that is not ideal. I can have a look at it later (or due to the weekend)
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On Thu, Mar 02, 2017 at 09:52:21AM -0800, Junio C Hamano wrote: > >> + * and is_binary check being that we want to avoid > >> + * opening the file and inspecting the contents, this > >> + * is probably fine. > >> + */ > >>if ((flags & CHECK_BINARY) && > >>s->size > big_file_threshold && s->is_binary == -1) { > >>s->is_binary = 1; > > > > I'm trying to think how this "not strictly correct" could bite us. > > Note that the comment is just documenting what I learned and thought > while working on an unrelated thing that happened to be sitting next > to it. Yeah, sorry, this is obviously not a blocker to your patch. I'm just wondering if there is more work needed. > To be quite honest, I do not think this code should cater to LFS or > any other conversion hack. They all install their own diff driver > and they can tell diff_filespec_is_binary() if the thing is binary > or not without falling back to this heuristics codepath. Yeah, you're right, I was just being silly. Whatever configured the filter already has an opportunity to give us this knowledge in a better way, and we should rely on that. -Peff
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Jeff Kingwrites: >> diff --git a/diff.c b/diff.c >> index 8c78fce49d..dc51dceb44 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, >> unsigned int flags) >> s->should_free = 1; >> return 0; >> } >> -if (size_only) >> + >> +/* >> + * Even if the caller would be happy with getting >> + * only the size, we cannot return early at this >> + * point if the path requires us to run the content >> + * conversion. >> + */ >> +if (!would_convert_to_git(s->path) && size_only) >> return 0; > > The would_convert_to_git() function is a little expensive (it may have > to do an attribute lookup). It may be worth swapping the two halves of > the conditional here to get the short-circuit. Yes. I think it makes sense. >> + >> +/* >> + * Note: this check uses xsize_t(st.st_size) that may >> + * not be the true size of the blob after it goes >> + * through convert_to_git(). This may not strictly be >> + * correct, but the whole point of big_file_threashold > > s/threashold/threshold/ Thanks. I felt there was something wrong and looked at the line three times but somehow failed to spot exactly what was wrong ;-) > >> + * and is_binary check being that we want to avoid >> + * opening the file and inspecting the contents, this >> + * is probably fine. >> + */ >> if ((flags & CHECK_BINARY) && >> s->size > big_file_threshold && s->is_binary == -1) { >> s->is_binary = 1; > > I'm trying to think how this "not strictly correct" could bite us. Note that the comment is just documenting what I learned and thought while working on an unrelated thing that happened to be sitting next to it. Nobody asks "I am OK without the contents i.e. size-only" and "Can you see if this is binary?" at the same time (and if a caller did, it would never have got is_binary with the original code). s->size is still a copy of st.size at this point of the code (we have not actually updated it to the size of the real blob, which happens a bit later in the flow of this codepath where we actually slurp the thing in and run the conversion). So with or without this patch, there shouldn't be any change in the behaviour wrt CHECK_BINARY. > For > line-ending conversion, I'd say that the before/after are going to be > approximately the same size. But what about something like LFS? If I > have a 600MB file that convert_to_git() filters into a short LFS > pointer, I think this changes the behavior. Before, we would diff the > pointer file, but now we'll get "binary file changed". To be quite honest, I do not think this code should cater to LFS or any other conversion hack. They all install their own diff driver and they can tell diff_filespec_is_binary() if the thing is binary or not without falling back to this heuristics codepath. What is done here *is* inconsistent with what is done on the other side of if/else, that compares big_file_threshold with in-git size, not in-working-tree size. That may want to be addressed, but I am not sure if it is worth it.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Mike Crowewrites: > All the solutions presented so far do cause a small change in behaviour > when using git diff --quiet: they may now cause warning messages like: > > warning: CRLF will be replaced by LF in crlf.txt. > The file will have its original line endings in your working directory. That is actually a good thing, I think. As the test modifies a file that originally has "Hello\r\nWorld\r\n" in it to this: >> +test_expect_success 'quiet diff works on file with line-ending change that >> has no effect on repository' ' >> +printf "Hello\r\nWorld\n" >crlf.txt && If you did "git add" at this point, you would get the same warning, because the lack of CR on the second line could well be a mistake you may want to notice and fix before going forward. Otherwise you'd be losing information that _might_ matter to you (i.e. the fact that the first line had CRLF while the second had LF) and it is the whole point of safe_crlf setting. I also think it is a good thing if "git status" reported this path as modified for the same reason (I didn't actually check if that is the case).
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On 2017-03-01 22:25, Mike Crowe wrote: > On Wednesday 01 March 2017 at 18:04:44 +0100, tbo...@web.de wrote: >> From: Junio C Hamano>> >> git diff --quiet may take a short-cut to see if a file is changed >> in the working tree: >> Whenever the file size differs from what is recorded in the index, >> the file is assumed to be changed and git diff --quiet returns >> exit with code 1 >> >> This shortcut must be suppressed whenever the line endings are converted >> or a filter is in use. >> The attributes say "* text=auto" and a file has >> "Hello\nWorld\n" in the index with a length of 12. >> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. >> (Or even "Hello\r\nWorld\n"). >> In this case "git add" will not do any changes to the index, and >> "git diff -quiet" should exit 0. >> >> Add calls to would_convert_to_git() before blindly saying that a different >> size means different content. >> >> Reported-By: Mike Crowe >> Signed-off-by: Torsten Bögershausen >> --- >> This is what I can come up with, collecting all the loose ends. >> I'm not sure if Mike wan't to have the Reported-By with a >> Signed-off-by ? >> The other question is, if the commit message summarizes the discussion >> well enough ? >> >> diff.c| 18 ++ >> t/t0028-diff-converted.sh | 27 +++ >> 2 files changed, 41 insertions(+), 4 deletions(-) >> create mode 100755 t/t0028-diff-converted.sh >> >> diff --git a/diff.c b/diff.c >> index 051761b..c264758 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct >> diff_filepair *p) >> *differences. >> * >> * 2. At this point, the file is known to be modified, >> - *with the same mode and size, and the object >> - *name of one side is unknown. Need to inspect >> - *the identical contents. >> + *with the same mode and size, the object >> + *name of one side is unknown, or size comparison >> + *cannot be depended upon. Need to inspect the >> + *contents. >> */ >> if (!DIFF_FILE_VALID(p->one) || /* (1) */ >> !DIFF_FILE_VALID(p->two) || >> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct >> diff_filepair *p) >> (p->one->mode != p->two->mode) || >> diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || >> diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || >> -(p->one->size != p->two->size) || >> + >> +/* >> + * only if eol and other conversions are not involved, >> + * we can say that two contents of different sizes >> + * cannot be the same without checking their contents. >> + */ >> +(!would_convert_to_git(p->one->path) && >> + !would_convert_to_git(p->two->path) && >> + (p->one->size != p->two->size)) || >> + >> !diff_filespec_is_identical(p->one, p->two)) /* (2) */ >> p->skip_stat_unmatch_result = 1; >> return p->skip_stat_unmatch_result; >> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh >> new file mode 100755 >> index 000..3d5ab95 >> --- /dev/null >> +++ b/t/t0028-diff-converted.sh >> @@ -0,0 +1,27 @@ >> +#!/bin/sh >> +# >> +# Copyright (c) 2017 Mike Crowe >> +# >> +# These tests ensure that files changing line endings in the presence >> +# of .gitattributes to indicate that line endings should be ignored >> +# don't cause 'git diff' or 'git diff --quiet' to think that they have >> +# been changed. >> + >> +test_description='git diff with files that require CRLF conversion' >> + >> +. ./test-lib.sh >> + >> +test_expect_success setup ' >> +echo "* text=auto" >.gitattributes && >> +printf "Hello\r\nWorld\r\n" >crlf.txt && >> +git add .gitattributes crlf.txt && >> +git commit -m "initial" >> +' >> + >> +test_expect_success 'quiet diff works on file with line-ending change that >> has no effect on repository' ' >> +printf "Hello\r\nWorld\n" >crlf.txt && >> +git status && >> +git diff --quiet >> +' >> + >> +test_done > > Hi Torsten, > > Thanks for investigating this. > > I think that you've simplified the test to the point where it doesn't > entirely prove the fix. Although you test the case where the file has > changed size, you don't test the case where it hasn't. > > Unfortunately that was the part of my test that could only reproduce the > problem with the sleeps. Maybe someone who understands how the cache works > fully could explain an alternative way to force the cache not to be used. > > Also, I think I've found a behaviour change with this fix. Consider: > > echo "* text=auto" >.gitattributes > printf "Hello\r\nWorld\r\n" >crlf.txt That should give "Hello\nWorld\n" in the index: git add .gitattributes crlf.txt warning: CRLF will be replaced by LF in ttt/crlf.txt. The file will have its original
git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions)
On Tuesday 28 February 2017 at 19:06:44 +0100, Torsten Bögershausen wrote: > My understanding is that git diff --quiet should be quiet, when > git add will not do anything (but the file is "touched". > The touched means that Git will detect e.g a new mtime or inode > or file size when doing lstat(). Does the same apply to "git status"? If so, then whilst investigating the "git diff --quiet" problems in this thread I've found a similar bug with "git status". It reports the file has modifications even if only the line-endings have changed, and issuing "git add" causes the perceived modification to disappear. It can be very confusing for users if "git status" reports a modification but for "git diff" to claim that the files are identical. This bug is still reproducible even with the fix from https://public-inbox.org/git/xmqqshmyhtnu@gitster.mtv.corp.google.com/T/#m67cbfad1f2efe721f0c2afac2a1523b743bb57ca Here's the test case. Test 3 is the part that currently fails: commit de5f3f1d9161cdd46342689abe38a046fc71850e Author: Mike CroweDate: Sat Feb 25 09:28:37 2017 + status: Add tests for status output when file line endings change diff --git a/t/t7518-status-eol-change.sh b/t/t7518-status-eol-change.sh new file mode 100755 index 000..e18186f --- /dev/null +++ b/t/t7518-status-eol-change.sh @@ -0,0 +1,37 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# + +test_description='git status with files that require CRLF conversion' + +. ./test-lib.sh + +cat >expected_no_change < .gitattributes && + printf "Hello\r\nWorld\r\n" > crlf.txt && + printf "expected_no_change\nactual\n" > .gitignore && + git add .gitignore .gitattributes crlf.txt && + git commit -m "initial" +' +test_expect_success 'git status reports no change if file regenerated' ' + printf "Hello\r\nWorld\r\n" > crlf.txt && + git status >actual && + test_cmp expected_no_change actual +' +test_expect_success 'git status reports no change if line endings change' ' + printf "Hello\nWorld\n" > crlf.txt && + git status >actual && + test_cmp expected_no_change actual +' +test_expect_success 'git status reports no change if line ending change is staged' ' + git add crlf.txt && + git status >actual && + test_cmp expected_no_change actual +' +test_done
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On Wednesday 01 March 2017 at 13:54:26 -0800, Junio C Hamano wrote: > Now I thought about it through a bit more thoroughly, I think this > is the right approach, so here is my (tenative) final version. > > I seem to be getty really rusty---after all the codepaths involved > are practically all my code and I should have noticed the real > culprit during my first attempt X-<. > > Thanks for helping. > > -- >8 -- > Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in > diff_populate_filespec() > > Callers of diff_populate_filespec() can choose to ask only for the > size of the blob without grabbing the blob data, and the function, > after running lstat() when the filespec points at a working tree > file, returns by copying the value in size field of the stat > structure into the size field of the filespec when this is the case. > > However, this short-cut cannot be taken if the contents from the > path needs to go through convert_to_git(), whose resulting real blob > data may be different from what is in the working tree file. > > As "git diff --quiet" compares the .size fields of filespec > structures to skip content comparison, this bug manifests as a > false "there are differences" for a file that needs eol conversion, > for example. > > Reported-by: Mike Crowe> Helped-by: Torsten Bögershausen > Signed-off-by: Junio C Hamano > --- > diff.c| 19 ++- > t/t0028-diff-converted.sh | 27 +++ > 2 files changed, 45 insertions(+), 1 deletion(-) > create mode 100755 t/t0028-diff-converted.sh > > diff --git a/diff.c b/diff.c > index 8c78fce49d..dc51dceb44 100644 > --- a/diff.c > +++ b/diff.c > @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, > unsigned int flags) > s->should_free = 1; > return 0; > } > - if (size_only) > + > + /* > + * Even if the caller would be happy with getting > + * only the size, we cannot return early at this > + * point if the path requires us to run the content > + * conversion. > + */ > + if (!would_convert_to_git(s->path) && size_only) > return 0; > + > + /* > + * Note: this check uses xsize_t(st.st_size) that may > + * not be the true size of the blob after it goes > + * through convert_to_git(). This may not strictly be > + * correct, but the whole point of big_file_threashold > + * and is_binary check being that we want to avoid > + * opening the file and inspecting the contents, this > + * is probably fine. > + */ > if ((flags & CHECK_BINARY) && > s->size > big_file_threshold && s->is_binary == -1) { > s->is_binary = 1; This patch solves the problem for me. Including my tests where the file size doesn't change but the file has been touched. It also doesn't have the side effect of failing to report the extra trailing newline that the original fix suffered from. All the solutions presented so far do cause a small change in behaviour when using git diff --quiet: they may now cause warning messages like: warning: CRLF will be replaced by LF in crlf.txt. The file will have its original line endings in your working directory. to be emitted (unless of course core.safecrlf=false.) I think this is an unavoidable side-effect of doing the job properly but it might be worth mentioning. > diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh > new file mode 100755 > index 00..3d5ab9565b > --- /dev/null > +++ b/t/t0028-diff-converted.sh > @@ -0,0 +1,27 @@ > +#!/bin/sh > +# > +# Copyright (c) 2017 Mike Crowe > +# > +# These tests ensure that files changing line endings in the presence > +# of .gitattributes to indicate that line endings should be ignored > +# don't cause 'git diff' or 'git diff --quiet' to think that they have > +# been changed. > + > +test_description='git diff with files that require CRLF conversion' > + > +. ./test-lib.sh > + > +test_expect_success setup ' > + echo "* text=auto" >.gitattributes && > + printf "Hello\r\nWorld\r\n" >crlf.txt && > + git add .gitattributes crlf.txt && > + git commit -m "initial" > +' > + > +test_expect_success 'quiet diff works on file with line-ending change that > has no effect on repository' ' > + printf "Hello\r\nWorld\n" >crlf.txt && > + git status && > + git diff --quiet > +' > + > +test_done As I said before, this doesn't actually test the case when the file sizes match. However, given the way that the code has changed the actual file sizes are not compared, so perhaps this doesn't matter. Thanks for all your help investigating this. Mike.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On Wed, Mar 01, 2017 at 01:54:26PM -0800, Junio C Hamano wrote: > -- >8 -- > Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in > diff_populate_filespec() Thanks, this is well-explained, and the new comments in the code really help. I wondered if we should be checking would_convert_to_git() in reuse_worktree_file(), but we already do. It's just that we may still end up in this code-path when we're _actually_ diffing the working tree file, not just trying to optimize. > diff --git a/diff.c b/diff.c > index 8c78fce49d..dc51dceb44 100644 > --- a/diff.c > +++ b/diff.c > @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, > unsigned int flags) > s->should_free = 1; > return 0; > } > - if (size_only) > + > + /* > + * Even if the caller would be happy with getting > + * only the size, we cannot return early at this > + * point if the path requires us to run the content > + * conversion. > + */ > + if (!would_convert_to_git(s->path) && size_only) > return 0; The would_convert_to_git() function is a little expensive (it may have to do an attribute lookup). It may be worth swapping the two halves of the conditional here to get the short-circuit. It may not matter much in practice, though, because in the !size_only case we'd make the same query lower a few lines later (and in theory expensive bits of the attr lookup are cached). > + > + /* > + * Note: this check uses xsize_t(st.st_size) that may > + * not be the true size of the blob after it goes > + * through convert_to_git(). This may not strictly be > + * correct, but the whole point of big_file_threashold s/threashold/threshold/ > + * and is_binary check being that we want to avoid > + * opening the file and inspecting the contents, this > + * is probably fine. > + */ > if ((flags & CHECK_BINARY) && > s->size > big_file_threshold && s->is_binary == -1) { > s->is_binary = 1; I'm trying to think how this "not strictly correct" could bite us. For line-ending conversion, I'd say that the before/after are going to be approximately the same size. But what about something like LFS? If I have a 600MB file that convert_to_git() filters into a short LFS pointer, I think this changes the behavior. Before, we would diff the pointer file, but now we'll get "binary file changed". I wonder if we should take the opposite approach, and ignore big_file_threshold for converted files. One assumes that such gigantic files are binary, and therefore do not have line endings to convert. And any filtering has a reasonable chance of condensing them to something much smaller. I dunno. I'm sure somebody has some horrific 500MB-filtering example that can prove me wrong. -Peff
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Mike Crowewrites: > With the above patch, both "git diff" and "git diff --quiet" report that > there are no changes. Previously Git would report the extra newline > correctly. I sent an updated one that (I think) fixes the real issue, which the extra would_convert_to_git() calls added in the older iteration to diff_filespec_check_stat_unmatch() were merely papering over. It would be nice to see if it fixes the issue for you. Thanks.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
Now I thought about it through a bit more thoroughly, I think this is the right approach, so here is my (tenative) final version. I seem to be getty really rusty---after all the codepaths involved are practically all my code and I should have noticed the real culprit during my first attempt X-<. Thanks for helping. -- >8 -- Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec() Callers of diff_populate_filespec() can choose to ask only for the size of the blob without grabbing the blob data, and the function, after running lstat() when the filespec points at a working tree file, returns by copying the value in size field of the stat structure into the size field of the filespec when this is the case. However, this short-cut cannot be taken if the contents from the path needs to go through convert_to_git(), whose resulting real blob data may be different from what is in the working tree file. As "git diff --quiet" compares the .size fields of filespec structures to skip content comparison, this bug manifests as a false "there are differences" for a file that needs eol conversion, for example. Reported-by: Mike CroweHelped-by: Torsten Bögershausen Signed-off-by: Junio C Hamano --- diff.c| 19 ++- t/t0028-diff-converted.sh | 27 +++ 2 files changed, 45 insertions(+), 1 deletion(-) create mode 100755 t/t0028-diff-converted.sh diff --git a/diff.c b/diff.c index 8c78fce49d..dc51dceb44 100644 --- a/diff.c +++ b/diff.c @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) s->should_free = 1; return 0; } - if (size_only) + + /* +* Even if the caller would be happy with getting +* only the size, we cannot return early at this +* point if the path requires us to run the content +* conversion. +*/ + if (!would_convert_to_git(s->path) && size_only) return 0; + + /* +* Note: this check uses xsize_t(st.st_size) that may +* not be the true size of the blob after it goes +* through convert_to_git(). This may not strictly be +* correct, but the whole point of big_file_threashold +* and is_binary check being that we want to avoid +* opening the file and inspecting the contents, this +* is probably fine. +*/ if ((flags & CHECK_BINARY) && s->size > big_file_threshold && s->is_binary == -1) { s->is_binary = 1; diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh new file mode 100755 index 00..3d5ab9565b --- /dev/null +++ b/t/t0028-diff-converted.sh @@ -0,0 +1,27 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# +# These tests ensure that files changing line endings in the presence +# of .gitattributes to indicate that line endings should be ignored +# don't cause 'git diff' or 'git diff --quiet' to think that they have +# been changed. + +test_description='git diff with files that require CRLF conversion' + +. ./test-lib.sh + +test_expect_success setup ' + echo "* text=auto" >.gitattributes && + printf "Hello\r\nWorld\r\n" >crlf.txt && + git add .gitattributes crlf.txt && + git commit -m "initial" +' + +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' ' + printf "Hello\r\nWorld\n" >crlf.txt && + git status && + git diff --quiet +' + +test_done -- 2.12.0-319-gc5f21175ee
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
On Wednesday 01 March 2017 at 18:04:44 +0100, tbo...@web.de wrote: > From: Junio C Hamano> > git diff --quiet may take a short-cut to see if a file is changed > in the working tree: > Whenever the file size differs from what is recorded in the index, > the file is assumed to be changed and git diff --quiet returns > exit with code 1 > > This shortcut must be suppressed whenever the line endings are converted > or a filter is in use. > The attributes say "* text=auto" and a file has > "Hello\nWorld\n" in the index with a length of 12. > The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. > (Or even "Hello\r\nWorld\n"). > In this case "git add" will not do any changes to the index, and > "git diff -quiet" should exit 0. > > Add calls to would_convert_to_git() before blindly saying that a different > size means different content. > > Reported-By: Mike Crowe > Signed-off-by: Torsten Bögershausen > --- > This is what I can come up with, collecting all the loose ends. > I'm not sure if Mike wan't to have the Reported-By with a > Signed-off-by ? > The other question is, if the commit message summarizes the discussion > well enough ? > > diff.c| 18 ++ > t/t0028-diff-converted.sh | 27 +++ > 2 files changed, 41 insertions(+), 4 deletions(-) > create mode 100755 t/t0028-diff-converted.sh > > diff --git a/diff.c b/diff.c > index 051761b..c264758 100644 > --- a/diff.c > +++ b/diff.c > @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct > diff_filepair *p) >*differences. >* >* 2. At this point, the file is known to be modified, > - *with the same mode and size, and the object > - *name of one side is unknown. Need to inspect > - *the identical contents. > + *with the same mode and size, the object > + *name of one side is unknown, or size comparison > + *cannot be depended upon. Need to inspect the > + *contents. >*/ > if (!DIFF_FILE_VALID(p->one) || /* (1) */ > !DIFF_FILE_VALID(p->two) || > @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct > diff_filepair *p) > (p->one->mode != p->two->mode) || > diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > - (p->one->size != p->two->size) || > + > + /* > + * only if eol and other conversions are not involved, > + * we can say that two contents of different sizes > + * cannot be the same without checking their contents. > + */ > + (!would_convert_to_git(p->one->path) && > + !would_convert_to_git(p->two->path) && > + (p->one->size != p->two->size)) || > + > !diff_filespec_is_identical(p->one, p->two)) /* (2) */ > p->skip_stat_unmatch_result = 1; > return p->skip_stat_unmatch_result; > diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh > new file mode 100755 > index 000..3d5ab95 > --- /dev/null > +++ b/t/t0028-diff-converted.sh > @@ -0,0 +1,27 @@ > +#!/bin/sh > +# > +# Copyright (c) 2017 Mike Crowe > +# > +# These tests ensure that files changing line endings in the presence > +# of .gitattributes to indicate that line endings should be ignored > +# don't cause 'git diff' or 'git diff --quiet' to think that they have > +# been changed. > + > +test_description='git diff with files that require CRLF conversion' > + > +. ./test-lib.sh > + > +test_expect_success setup ' > + echo "* text=auto" >.gitattributes && > + printf "Hello\r\nWorld\r\n" >crlf.txt && > + git add .gitattributes crlf.txt && > + git commit -m "initial" > +' > + > +test_expect_success 'quiet diff works on file with line-ending change that > has no effect on repository' ' > + printf "Hello\r\nWorld\n" >crlf.txt && > + git status && > + git diff --quiet > +' > + > +test_done Hi Torsten, Thanks for investigating this. I think that you've simplified the test to the point where it doesn't entirely prove the fix. Although you test the case where the file has changed size, you don't test the case where it hasn't. Unfortunately that was the part of my test that could only reproduce the problem with the sleeps. Maybe someone who understands how the cache works fully could explain an alternative way to force the cache not to be used. Also, I think I've found a behaviour change with this fix. Consider: echo "* text=auto" >.gitattributes printf "Hello\r\nWorld\r\n" >crlf.txt git add .gitattributes crlf.txt git commit -m "initial" printf "\r\n" >>crlf.txt With the above patch, both "git diff" and "git diff --quiet" report that there are no changes. Previously Git would report the extra newline correctly. Mike.
Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
tbo...@web.de writes: > From: Junio C Hamano> > git diff --quiet may take a short-cut to see if a file is changed > in the working tree: > Whenever the file size differs from what is recorded in the index, > the file is assumed to be changed and git diff --quiet returns > exit with code 1 > > This shortcut must be suppressed whenever the line endings are converted > or a filter is in use. > The attributes say "* text=auto" and a file has > "Hello\nWorld\n" in the index with a length of 12. > The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. > (Or even "Hello\r\nWorld\n"). > In this case "git add" will not do any changes to the index, and > "git diff -quiet" should exit 0. The thing I find the most disturbing is that at this point in the flow, p->one->size and p->two->size are supposed to be the sizes of the blob object, not the contents of the file on the working tree. IOW, p->two->size being 14 in the above example sounds like pointing at a different bug, if it is 14. The early return in diff_populate_filespec(), where it does s->size = xsize_t(st.st_size); ... if (size_only) return 0; way before it runs convert_to_git(), may be the real culprit. I am wondering if the real fix would be to do this, instead of the two extra would_convert_to_git() call there in the patch you sent. The result seems to still pass the new test in your patch. Thanks for helping. diff.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/diff.c b/diff.c index 8c78fce49d..dc51dceb44 100644 --- a/diff.c +++ b/diff.c @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags) s->should_free = 1; return 0; } - if (size_only) + + /* +* Even if the caller would be happy with getting +* only the size, we cannot return early at this +* point if the path requires us to run the content +* conversion. +*/ + if (!would_convert_to_git(s->path) && size_only) return 0; + + /* +* Note: this check uses xsize_t(st.st_size) that may +* not be the true size of the blob after it goes +* through convert_to_git(). This may not strictly be +* correct, but the whole point of big_file_threashold +* and is_binary check is that we want to avoid +* opening the file and inspecting the contents, so +* this is probably fine. +*/ if ((flags & CHECK_BINARY) && s->size > big_file_threshold && s->is_binary == -1) { s->is_binary = 1;
[PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions
From: Junio C Hamanogit diff --quiet may take a short-cut to see if a file is changed in the working tree: Whenever the file size differs from what is recorded in the index, the file is assumed to be changed and git diff --quiet returns exit with code 1 This shortcut must be suppressed whenever the line endings are converted or a filter is in use. The attributes say "* text=auto" and a file has "Hello\nWorld\n" in the index with a length of 12. The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14. (Or even "Hello\r\nWorld\n"). In this case "git add" will not do any changes to the index, and "git diff -quiet" should exit 0. Add calls to would_convert_to_git() before blindly saying that a different size means different content. Reported-By: Mike Crowe Signed-off-by: Torsten Bögershausen --- This is what I can come up with, collecting all the loose ends. I'm not sure if Mike wan't to have the Reported-By with a Signed-off-by ? The other question is, if the commit message summarizes the discussion well enough ? diff.c| 18 ++ t/t0028-diff-converted.sh | 27 +++ 2 files changed, 41 insertions(+), 4 deletions(-) create mode 100755 t/t0028-diff-converted.sh diff --git a/diff.c b/diff.c index 051761b..c264758 100644 --- a/diff.c +++ b/diff.c @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) *differences. * * 2. At this point, the file is known to be modified, -*with the same mode and size, and the object -*name of one side is unknown. Need to inspect -*the identical contents. +*with the same mode and size, the object +*name of one side is unknown, or size comparison +*cannot be depended upon. Need to inspect the +*contents. */ if (!DIFF_FILE_VALID(p->one) || /* (1) */ !DIFF_FILE_VALID(p->two) || @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct diff_filepair *p) (p->one->mode != p->two->mode) || diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || - (p->one->size != p->two->size) || + + /* +* only if eol and other conversions are not involved, +* we can say that two contents of different sizes +* cannot be the same without checking their contents. +*/ + (!would_convert_to_git(p->one->path) && +!would_convert_to_git(p->two->path) && +(p->one->size != p->two->size)) || + !diff_filespec_is_identical(p->one, p->two)) /* (2) */ p->skip_stat_unmatch_result = 1; return p->skip_stat_unmatch_result; diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh new file mode 100755 index 000..3d5ab95 --- /dev/null +++ b/t/t0028-diff-converted.sh @@ -0,0 +1,27 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# +# These tests ensure that files changing line endings in the presence +# of .gitattributes to indicate that line endings should be ignored +# don't cause 'git diff' or 'git diff --quiet' to think that they have +# been changed. + +test_description='git diff with files that require CRLF conversion' + +. ./test-lib.sh + +test_expect_success setup ' + echo "* text=auto" >.gitattributes && + printf "Hello\r\nWorld\r\n" >crlf.txt && + git add .gitattributes crlf.txt && + git commit -m "initial" +' + +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' ' + printf "Hello\r\nWorld\n" >crlf.txt && + git status && + git diff --quiet +' + +test_done -- 2.10.0
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
Torsten Bögershausenwrites: > On 2017-02-27 21:17, Junio C Hamano wrote: > >> Torsten, you've been quite active in fixing various glitches around >> the EOL conversion in the latter half of last year. Have any >> thoughts to share on this topic? >> >> Thanks. > > Sorry for the delay, being too busy with other things. > I followed the discussion, but didn't have good things to contribute. > I am not an expert in diff.c, but there seems to be a bug, thanks everybody > for digging. > > Back to business: > > My understanding is that git diff --quiet should be quiet, when > git add will not do anything. Yes, I think that is a sensible criterion. What I was interested to hear from you the most was to double check if Mike's expectation is reasonable. Earlier we had a lengthy discussion on what to do when convert-to-git and convert-to-working-tree conversions do not round trip, and I was wondering if this was one of those cases.
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
On 2017-02-27 21:17, Junio C Hamano wrote: > Torsten, you've been quite active in fixing various glitches around > the EOL conversion in the latter half of last year. Have any > thoughts to share on this topic? > > Thanks. Sorry for the delay, being too busy with other things. I followed the discussion, but didn't have good things to contribute. I am not an expert in diff.c, but there seems to be a bug, thanks everybody for digging. Back to business: My understanding is that git diff --quiet should be quiet, when git add will not do anything (but the file is "touched". The touched means that Git will detect e.g a new mtime or inode or file size when doing lstat(). mtime is tricky, inodes are problematic under Windows. What is easy to change is the file length. I don't thing that we need a test file with LF, nor do we need the sleep, touch or anything. Would the the following work ? (This is copy-paste, so the TABs may be corrupted) #!/bin/sh # # Copyright (c) 2017 Mike Crowe # # These tests ensure that files changing line endings in the presence # of .gitattributes to indicate that line endings should be ignored # don't cause 'git diff' or 'git diff --quiet' to think that they have # been changed. test_description='git diff with files that require CRLF conversion' . ./test-lib.sh test_expect_success setup ' echo "* text=auto" > .gitattributes && printf "Hello\r\nWorld\r\n" >crlf.txt && git add .gitattributes crlf.txt lf.txt && git commit -m "initial" ' test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' ' printf "Hello\r\nWorld\n" >crlf.txt && git status && git diff --quiet ' test_done
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
Torsten, you've been quite active in fixing various glitches around the EOL conversion in the latter half of last year. Have any thoughts to share on this topic? Thanks. Mike Crowewrites: > On Monday 20 February 2017 at 13:25:01 -0800, Junio C Hamano wrote: >> This almost makes me suspect that the place that checks lengths of >> one and two in order to refrain from running more expensive content >> comparison you found earlier need to ask would_convert_to_git() >> before taking the short-cut, something along the lines of this (for >> illustration purposes only, not even compile-tested). The "almost" >> comes to me because I do not offhand know the performance implications >> of making calls to would_convert_to_git() here. >> >> diff.c | 18 ++ >> 1 file changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/diff.c b/diff.c >> index 051761be40..094d5913da 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct >> diff_filepair *p) >> *differences. >> * >> * 2. At this point, the file is known to be modified, >> - *with the same mode and size, and the object >> - *name of one side is unknown. Need to inspect >> - *the identical contents. >> + *with the same mode and size, the object >> + *name of one side is unknown, or size comparison >> + *cannot be depended upon. Need to inspect the >> + *contents. >> */ >> if (!DIFF_FILE_VALID(p->one) || /* (1) */ >> !DIFF_FILE_VALID(p->two) || >> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct >> diff_filepair *p) >> (p->one->mode != p->two->mode) || >> diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || >> diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || >> -(p->one->size != p->two->size) || >> + >> +/* >> + * only if eol and other conversions are not involved, >> + * we can say that two contents of different sizes >> + * cannot be the same without checking their contents. >> + */ >> +(!would_convert_to_git(p->one->path) && >> + !would_convert_to_git(p->two->path) && >> + (p->one->size != p->two->size)) || >> + >> !diff_filespec_is_identical(p->one, p->two)) /* (2) */ >> p->skip_stat_unmatch_result = 1; >> return p->skip_stat_unmatch_result; >> >> > > Thanks for investigating this. I think you are correct that I was misguided > in my previous "fix". However, your change above does fix the problem for > me. > > It looks like the main cost of convert_to_git is in convert_attrs which > ends up doing various path operations in attr.c. After that, both > apply_filter and crlf_to_git return straight away if there's nothing to do. > > I experimented several times with running "git diff -quiet" after touching > every file in Git's own worktree and any difference in total time was lost > in the noise. > > I've further improved my test case. Tests 3 and 4 fail without the above > change but pass with it. Unfortunately I'm still unable to get those tests > to fail without the above fix unless the sleeps are present. I've tried > using the "touch -r .datetime" technique from racy-git.txt but it doesn't > help. It seems that I'm unable to stop Git from using its cache without > sleeping. :( > > diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh > new file mode 100755 > index 000..31a730d > --- /dev/null > +++ b/t/t4063-diff-converted.sh > @@ -0,0 +1,44 @@ > +#!/bin/sh > +# > +# Copyright (c) 2017 Mike Crowe > +# > +# These tests ensure that files changing line endings in the presence > +# of .gitattributes to indicate that line endings should be ignored > +# don't cause 'git diff' or 'git diff --quiet' to think that they have > +# been changed. > +# > +# The sleeps are necessary to reproduce the problem for reasons that I > +# don't understand. > + > +test_description='git diff with files that require CRLF conversion' > + > +. ./test-lib.sh > + > +test_expect_success setup ' > + echo "* text=auto" > .gitattributes && > + printf "Hello\r\nWorld\r\n" > crlf.txt && > + printf "Hello\nWorld\n" > lf.txt && > + git add .gitattributes crlf.txt lf.txt && > + git commit -m "initial" && echo three > +' > +test_expect_success 'noisy diff works on file that requires CRLF conversion' > ' > + git status >/dev/null && > + git diff >/dev/null && > + sleep 1 && > + touch crlf.txt lf.txt && > + git diff >/dev/null > +' > +test_expect_success 'quiet diff works on file that requires CRLF conversion > with no changes' ' > + git status && > + git diff --quiet && > + sleep 1 && > + touch crlf.txt lf.txt && > + git diff --quiet > +' > + > +test_expect_success 'quiet diff works on file with line-ending change that > has no effect on repository' ' > + printf
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
On Monday 20 February 2017 at 13:25:01 -0800, Junio C Hamano wrote: > This almost makes me suspect that the place that checks lengths of > one and two in order to refrain from running more expensive content > comparison you found earlier need to ask would_convert_to_git() > before taking the short-cut, something along the lines of this (for > illustration purposes only, not even compile-tested). The "almost" > comes to me because I do not offhand know the performance implications > of making calls to would_convert_to_git() here. > > diff.c | 18 ++ > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/diff.c b/diff.c > index 051761be40..094d5913da 100644 > --- a/diff.c > +++ b/diff.c > @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct > diff_filepair *p) >*differences. >* >* 2. At this point, the file is known to be modified, > - *with the same mode and size, and the object > - *name of one side is unknown. Need to inspect > - *the identical contents. > + *with the same mode and size, the object > + *name of one side is unknown, or size comparison > + *cannot be depended upon. Need to inspect the > + *contents. >*/ > if (!DIFF_FILE_VALID(p->one) || /* (1) */ > !DIFF_FILE_VALID(p->two) || > @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct > diff_filepair *p) > (p->one->mode != p->two->mode) || > diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > - (p->one->size != p->two->size) || > + > + /* > + * only if eol and other conversions are not involved, > + * we can say that two contents of different sizes > + * cannot be the same without checking their contents. > + */ > + (!would_convert_to_git(p->one->path) && > + !would_convert_to_git(p->two->path) && > + (p->one->size != p->two->size)) || > + > !diff_filespec_is_identical(p->one, p->two)) /* (2) */ > p->skip_stat_unmatch_result = 1; > return p->skip_stat_unmatch_result; > > Thanks for investigating this. I think you are correct that I was misguided in my previous "fix". However, your change above does fix the problem for me. It looks like the main cost of convert_to_git is in convert_attrs which ends up doing various path operations in attr.c. After that, both apply_filter and crlf_to_git return straight away if there's nothing to do. I experimented several times with running "git diff -quiet" after touching every file in Git's own worktree and any difference in total time was lost in the noise. I've further improved my test case. Tests 3 and 4 fail without the above change but pass with it. Unfortunately I'm still unable to get those tests to fail without the above fix unless the sleeps are present. I've tried using the "touch -r .datetime" technique from racy-git.txt but it doesn't help. It seems that I'm unable to stop Git from using its cache without sleeping. :( diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh new file mode 100755 index 000..31a730d --- /dev/null +++ b/t/t4063-diff-converted.sh @@ -0,0 +1,44 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# +# These tests ensure that files changing line endings in the presence +# of .gitattributes to indicate that line endings should be ignored +# don't cause 'git diff' or 'git diff --quiet' to think that they have +# been changed. +# +# The sleeps are necessary to reproduce the problem for reasons that I +# don't understand. + +test_description='git diff with files that require CRLF conversion' + +. ./test-lib.sh + +test_expect_success setup ' + echo "* text=auto" > .gitattributes && + printf "Hello\r\nWorld\r\n" > crlf.txt && + printf "Hello\nWorld\n" > lf.txt && + git add .gitattributes crlf.txt lf.txt && + git commit -m "initial" && echo three +' +test_expect_success 'noisy diff works on file that requires CRLF conversion' ' + git status >/dev/null && + git diff >/dev/null && + sleep 1 && + touch crlf.txt lf.txt && + git diff >/dev/null +' +test_expect_success 'quiet diff works on file that requires CRLF conversion with no changes' ' + git status && + git diff --quiet && + sleep 1 && + touch crlf.txt lf.txt && + git diff --quiet +' + +test_expect_success 'quiet diff works on file with line-ending change that has no effect on repository' ' + printf "Hello\nWorld\n" > crlf.txt && + printf "Hello\r\nWorld\r\n" > lf.txt && + git diff --quiet +' +test_done Mike.
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
Mike Crowewrites: > I think that if there's a risk that file contents will undergo conversion > then this should force the diff to check the file contents. Something like: > > diff --git a/diff.c b/diff.c > index 051761b..bee1662 100644 > --- a/diff.c > +++ b/diff.c > @@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options) > /* >* Most of the time we can say "there are changes" >* only by checking if there are changed paths, but > - * --ignore-whitespace* options force us to look > - * inside contents. > + * --ignore-whitespace* options or text conversion > + * force us to look inside contents. >*/ > > if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) || > DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) || > - DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL)) > + DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) || > + DIFF_OPT_TST(options, ALLOW_TEXTCONV)) > DIFF_OPT_SET(options, DIFF_FROM_CONTENTS); > else > DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS); Thanks. You may be on the right track to find FROM_CONTENTS bit, but I think TEXTCONV bit is a red-herring. This part of diff.c caught my attention, while thinking about this topic: if (output_format & DIFF_FORMAT_NO_OUTPUT && DIFF_OPT_TST(options, EXIT_WITH_STATUS) && DIFF_OPT_TST(options, DIFF_FROM_CONTENTS)) { /* * run diff_flush_patch for the exit status. setting * options->file to /dev/null should be safe, because we * aren't supposed to produce any output anyway. */ and the body of this "if" statement loops over q->queue[]. It is about the "even though we prefer not having to format the patch because we are doing --quiet, we need to see if contents of one and two that we _know_ are different are made into the same thing when we compare them while ignoring various forms of whitespace changes". So one and two that are removed in earlier step because they were truly identical may not be penalized if you flip FROM_CONTENTS bit early on. Having said that, DIFF_FROM_CONTENTS is about all paths this options structure governs, not just paths that have eol conversion defined. When you say "diff --ignore-whitespace-change", that applies to all paths. The eol conversion is specified per-path, so ideally the FROM_CONTENTS bit should be flipped if and only if one or more of the paths would need the conversion (i.e. does the helper function would_convert_to_git() say "yes" to the path?). I however suspect that necessary information to do so (i.e. "which paths we are looking at?") has not been generated yet at the point of the code you quoted. setup comes (and must come) very early, and then q->queue[] is populated by different front-end functions that compare trees, the index, and the working tree, depending on the "git diff" option or "git diff-{tree,index,files}" plumbing command, and you can ask "would one of these paths need conversion?" only after q->queue[] is populated. Hmm. Another thing is that ALLOW_TEXTCONV is not a right bit to check for your example. It is "do we use textconv filters to turn binary files into a (phony) text representation before comparing?". People use the mechanism to throw JPEG photos in Git and have textconv filter to extract only EXIF data, and "diff --textconv" would let us textually compare only the EXIF data (which is the only human readable part of the contents anyway). It might be a good idea to also flip FROM_CONTENTS bit under "diff --textconv", and ... > This solves the problem for me and my test case now passes. ... but I suspect that you were misled to think it fixes your issue, only because "--textconv" is by default enabled for "git diff" and "git log" (see "git diff --help"). I think you are saying that if you always set FROM_CONTENTS bit on, you get what you want. But that is to be expected and it unfortunately penalizes everybody by turning an obvious optimization permanently off. Also "--textconv" is not on by default for the plumbing "git diff-index" command and its friends, so it is likely that "git diff-index HEAD" with your change will still not work as you expect. A cheap (from code-change point of view) band-aid might be to flip FROM_CONTENTS on if we know the repository has _some_ paths that need eol conversion, even when the particular "diff" we are taking does not involve any eol conversion (e.g. "is core.crlf set?"). While it may be better than "if we are porcelain (aka ALLOW_TEXTCONV is set), unconditionally flip FROM_CONTENTS on", it is not ideal, either. This almost makes me suspect that the place that checks lengths of one and two in order to refrain from running more expensive content comparison you found earlier need to ask would_convert_to_git() before taking the short-cut, something along the lines of
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
On Friday 17 February 2017 at 22:19:58 +, Mike Crowe wrote: > On Friday 17 February 2017 at 14:05:17 -0800, Junio C Hamano wrote: > > Mike Crowewrites: > > > > > If "git diff --quiet" finds it necessary to compare actual file contents, > > > and a file requires CRLF conversion, then it incorrectly exits with an > > > exit > > > code of 1 even if there have been no changes. > > > > > > The patch below adds a test file that shows the problem. > > > > If "git diff" does not show any output and "git diff --exit-code" or > > "git diff --quiet" says there are differences, then it is a bug. > > > > I would however have expected that any culprit would involve a code > > that says "under QUICK option, we do not have to bother doing > > this". The part you quoted: > > > > > if (!DIFF_FILE_VALID(p->one) || /* (1) */ > > > !DIFF_FILE_VALID(p->two) || > > > (p->one->oid_valid && p->two->oid_valid) || > > > (p->one->mode != p->two->mode) || > > > diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > > > diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > > > (p->one->size != p->two->size) || > > > !diff_filespec_is_identical(p->one, p->two)) /* (2) */ > > > p->skip_stat_unmatch_result = 1; > > > > is used by "git diff" with and without "--quiet", afacr, so I > > suspect that the bug lies somewhere else. > > I can't say that I really understand the code fully, but it appears that > the first pass generates a queue of files that contain differences. The > result of the quiet diff comes from the size of that queue, > diff_queued_diff.nr, being non-zero in diffcore_std. I'm assuming that the > result of the noisy diff comes from actually comparing the files. > > But, I've only spent a short while looking so I may have got the wrong end > of the stick. Tricking Git into checking the actual file contents (by passing --ignore-space-change for example) is sufficient to ensure that the exit status is never 1 when it should be zero. (Of course that option has other unwanted effects in this case.) I think that if there's a risk that file contents will undergo conversion then this should force the diff to check the file contents. Something like: diff --git a/diff.c b/diff.c index 051761b..bee1662 100644 --- a/diff.c +++ b/diff.c @@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options) /* * Most of the time we can say "there are changes" * only by checking if there are changed paths, but -* --ignore-whitespace* options force us to look -* inside contents. +* --ignore-whitespace* options or text conversion +* force us to look inside contents. */ if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) || DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) || - DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL)) + DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) || + DIFF_OPT_TST(options, ALLOW_TEXTCONV)) DIFF_OPT_SET(options, DIFF_FROM_CONTENTS); else DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS); This solves the problem for me and my test case now passes. Unfortunately it breaks the 'removing and adding subproject' test case in t3040-subprojects-basic at the line: test_expect_code 1 git diff -M --name-status --exit-code HEAD^ HEAD presumably because after the rename has been detected the file contents are identical. :( A rename of a single file appears to still be handled correctly. Mike.
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
On Friday 17 February 2017 at 14:05:17 -0800, Junio C Hamano wrote: > Mike Crowewrites: > > > If "git diff --quiet" finds it necessary to compare actual file contents, > > and a file requires CRLF conversion, then it incorrectly exits with an exit > > code of 1 even if there have been no changes. > > > > The patch below adds a test file that shows the problem. > > If "git diff" does not show any output and "git diff --exit-code" or > "git diff --quiet" says there are differences, then it is a bug. > > I would however have expected that any culprit would involve a code > that says "under QUICK option, we do not have to bother doing > this". The part you quoted: > > > if (!DIFF_FILE_VALID(p->one) || /* (1) */ > > !DIFF_FILE_VALID(p->two) || > > (p->one->oid_valid && p->two->oid_valid) || > > (p->one->mode != p->two->mode) || > > diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > > diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > > (p->one->size != p->two->size) || > > !diff_filespec_is_identical(p->one, p->two)) /* (2) */ > > p->skip_stat_unmatch_result = 1; > > is used by "git diff" with and without "--quiet", afacr, so I > suspect that the bug lies somewhere else. I can't say that I really understand the code fully, but it appears that the first pass generates a queue of files that contain differences. The result of the quiet diff comes from the size of that queue, diff_queued_diff.nr, being non-zero in diffcore_std. I'm assuming that the result of the noisy diff comes from actually comparing the files. But, I've only spent a short while looking so I may have got the wrong end of the stick. Thanks. Mike.
Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
Mike Crowewrites: > If "git diff --quiet" finds it necessary to compare actual file contents, > and a file requires CRLF conversion, then it incorrectly exits with an exit > code of 1 even if there have been no changes. > > The patch below adds a test file that shows the problem. If "git diff" does not show any output and "git diff --exit-code" or "git diff --quiet" says there are differences, then it is a bug. I would however have expected that any culprit would involve a code that says "under QUICK option, we do not have to bother doing this". The part you quoted: > if (!DIFF_FILE_VALID(p->one) || /* (1) */ > !DIFF_FILE_VALID(p->two) || > (p->one->oid_valid && p->two->oid_valid) || > (p->one->mode != p->two->mode) || > diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || > diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || > (p->one->size != p->two->size) || > !diff_filespec_is_identical(p->one, p->two)) /* (2) */ > p->skip_stat_unmatch_result = 1; is used by "git diff" with and without "--quiet", afacr, so I suspect that the bug lies somewhere else.
git diff --quiet exits with 1 on clean tree with CRLF conversions
If "git diff --quiet" finds it necessary to compare actual file contents, and a file requires CRLF conversion, then it incorrectly exits with an exit code of 1 even if there have been no changes. The patch below adds a test file that shows the problem. The first test of diff without --quiet correctly has an exit status of zero on both invocations. The second test of diff with --quiet has an exit code of zero on the first invocation, but an exit code of one on the second invocation. Further invocations (not included in the test) may yield an exit code of 1. Calling "git status" always fixes things. (The touching with "tomorrow" was my attempt to avoid the sleep, but that didn't seem to help - it appears that time must pass in order to ensure that the cache is not used.) The culprit would appear to be in diff_filespec_check_stat_unmatch where it assumes that the files are different if their sizes are different: if (!DIFF_FILE_VALID(p->one) || /* (1) */ !DIFF_FILE_VALID(p->two) || (p->one->oid_valid && p->two->oid_valid) || (p->one->mode != p->two->mode) || diff_populate_filespec(p->one, CHECK_SIZE_ONLY) || diff_populate_filespec(p->two, CHECK_SIZE_ONLY) || (p->one->size != p->two->size) || !diff_filespec_is_identical(p->one, p->two)) /* (2) */ p->skip_stat_unmatch_result = 1; In the failing case p->one->size == 14 and p->two->size == 12. Mike. diff --git a/t/t4063-diff-converted.sh b/t/t4063-diff-converted.sh new file mode 100755 index 000..a108dfb --- /dev/null +++ b/t/t4063-diff-converted.sh @@ -0,0 +1,32 @@ +#!/bin/sh +# +# Copyright (c) 2017 Mike Crowe +# + +test_description='git diff with files that require CRLF conversion' + +. ./test-lib.sh + +test_expect_success setup ' + echo "* text=auto" > .gitattributes && + printf "Hello\r\nWorld\r\n" > crlf.txt && + git add .gitattributes crlf.txt && + git commit -m "initial" +' +test_expect_success 'noisy diff works on file that requires CRLF conversion' ' + git status >/dev/null && + git diff >/dev/null && + sleep 1 && + touch --date=tomorrow crlf.txt && + git diff >/dev/null +' +# The sleep is required for reasons I don't fully understand +test_expect_success 'quiet diff works on file that requires CRLF conversion with no changes' ' + git status && + git diff --quiet && + sleep 1 && + touch --date=tomorrow crlf.txt && + git diff --quiet +' + +test_done