Re: Consider escaping special characters like 'less' does
On Sun, Oct 22, 2017 at 08:27:20AM -0400, Jason Pyeron wrote: > > ESC (for color) > > + if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) { > > What about CR [0x0D] ? I assumed that CR was one of the things we'd want to avoid (and it was in fact what I used to test this). E.g., try: echo base >file git add file printf 'foo\r+bar\n' >file Running through "less" shows something like: diff --git a/file b/file index df967b9..5b6ee80 100644 --- a/file +++ b/file @@ -1 +1 @@ -base +foo^M+bar but "git add -p" shows: $ git add -p diff --git a/file b/file index df967b9..5b6ee80 100644 --- a/file +++ b/file @@ -1 +1 @@ -base +bar Stage this hunk [y,n,q,a,d,/,e,?]? For systems where CRLF is the native line ending, we'd still expect to see only LFs here when line-ending conversion is on (since the diff shows the canonical in-repo form). For files where the CRs must remain for some reason, they'd show as a "?" at the end. The "^M" shown by "less" is a bit more informative. If we really do want to pursue this direction, it might make sense to use more descriptive placeholders. -Peff
RE: Consider escaping special characters like 'less' does
> -Original Message- > From: Jeff King > Sent: Monday, October 16, 2017 6:13 PM > To: Joris Valette > Cc: Andreas Schwab; Jason Pyeron; git@vger.kernel.org > Subject: Re: Consider escaping special characters like 'less' does > > > I.e., something like this would probably help your case > without hurting > anybody: > > diff --git a/git-add--interactive.perl b/git-add--interactive.perl > index 28b325d754..d44e5ea459 100755 > --- a/git-add--interactive.perl > +++ b/git-add--interactive.perl > @@ -714,6 +714,16 @@ sub parse_diff { > push @{$hunk[-1]{DISPLAY}}, > (@colored ? $colored[$i] : $diff[$i]); > } > + > + foreach my $hunk (@hunk) { > + foreach my $line (@{$hunk->{DISPLAY}}) { > + # all control chars minus newline and > ESC (for color) > + if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) { What about CR [0x0D] ? > + $hunk->{CONTROLCHARS} = 1; > + } > + } > + } > + > return @hunk; > } > > @@ -1407,6 +1417,9 @@ sub patch_update_file { > if ($hunk[$ix]{TYPE} eq 'hunk') { > $other .= ',e'; > } > + if ($hunk[$ix]->{CONTROLCHARS}) { > + print "warning: control characters in > hunk have been replaced by '?'\n"; > + } > for (@{$hunk[$ix]{DISPLAY}}) { > print; > } > > I can't help but feel this is the tip of a larger iceberg, > though. E.g., > what about characters outside of the terminal's correct encoding? Or > broken UTF-8 characters? > > -Peff >
Re: Consider escaping special characters like 'less' does
On Tue, Oct 17, 2017 at 10:13:34AM +0900, Junio C Hamano wrote: > Jeff King writes: > > > Alternatively, I suppose we could just always escape in > > add--interactive. I'm trying to think of a case where somebody would > > really want their diffFilter to see the raw bytes (or vice versa, to > > show raw bytes produced by their filter), but I'm having trouble coming > > up with one. > > Your patch below only implements the "tame the raw bytes that come > out of their filter", which is quite agreeable. Yes. I think that is probably OK, especially given that we continue to allow terminal escapes (certainly some filters would want to use colors; I don't know if any would want to use more exotic control codes). > > I can't help but feel this is the tip of a larger iceberg, though. E.g., > > what about characters outside of the terminal's correct encoding? Or > > broken UTF-8 characters? > > Hmph. If you use it as a "built-in" that is a fallback for > diffFilter, i.e. use it only when the end user does not have one, > then users can override whatever wrong thing the built-in logic does > so... ;-) Yes, and maybe that is the best way to do it. It just seems like it is opening a can of worms about exactly which things should be filtered and how. I also wondered if people would be annoyed that by using a filter, they don't get the benefit of the escaping, unless their filter implements it separately on top (and the original purpose of the filter option was for things like diff-highlight and diff-so-fancy, which do not do such escaping). -Peff
Re: Consider escaping special characters like 'less' does
Jeff King writes: > Alternatively, I suppose we could just always escape in > add--interactive. I'm trying to think of a case where somebody would > really want their diffFilter to see the raw bytes (or vice versa, to > show raw bytes produced by their filter), but I'm having trouble coming > up with one. Your patch below only implements the "tame the raw bytes that come out of their filter", which is quite agreeable. > I.e., something like this would probably help your case without hurting > anybody: > ... > > I can't help but feel this is the tip of a larger iceberg, though. E.g., > what about characters outside of the terminal's correct encoding? Or > broken UTF-8 characters? Hmph. If you use it as a "built-in" that is a fallback for diffFilter, i.e. use it only when the end user does not have one, then users can override whatever wrong thing the built-in logic does so... ;-)
Re: Consider escaping special characters like 'less' does
On Tue, Oct 17, 2017 at 12:47:01AM +0200, Andreas Schwab wrote: > On Okt 16 2017, Jeff King wrote: > > > I can't help but feel this is the tip of a larger iceberg, though. E.g., > > what about characters outside of the terminal's correct encoding? Or > > broken UTF-8 characters? > > Or correctly encoded UTF-8 characters that look confusing? Or blobs > with embedded escape sequences? Yes, leaving ESC unfiltered here made me hesitate, too. We must allow it to show colors, but showing diffs with raw terminal codes can be quite confusing. My general advice on committing unprintable characters is: don't. -Peff
Re: Consider escaping special characters like 'less' does
On Okt 16 2017, Jeff King wrote: > I can't help but feel this is the tip of a larger iceberg, though. E.g., > what about characters outside of the terminal's correct encoding? Or > broken UTF-8 characters? Or correctly encoded UTF-8 characters that look confusing? Or blobs with embedded escape sequences? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Consider escaping special characters like 'less' does
On Sun, Oct 15, 2017 at 11:37:04PM +0200, Joris Valette wrote: > 2017-10-15 22:06 GMT+02:00 Jeff King : > > Git's diff generation will never do such escaping by default, because it > > means creating a patch that cannot be applied to get back the original > > content. > > Indeed this would be a problem. That's where my knowledge of git's > source code ends, but in that case, can't the output be discriminated > against the command that was executed? > Command that outputs an applicable patch -> don't escape > Command that outputs a diff to see changes -> escape Speaking in a general sense, people use "git diff" for both purposes, and we can't necessarily tell which[1]. As a matter of fact, that's a potential problem with textconv filters as well (which are enabled by default for git-diff, but not for plumbing like diff-tree, format-patch, etc). I think the feature isn't widely used enough for people to run into problems (and they also use it only on things that they don't _expect_ to be able to make patches for, since they're binaries). [1] Of course we can come up with heuristics, like "is stdout going to a a terminal"? But in such a case we already kick in the pager, and it does the exact escaping you're asking for. :) For a command like add--interactive, it knows which invocations are for showing to the user and which are for applying (and it already uses "--color" selectively, for instance). So if there were an "escape this" option in git's diff generation, we could selectively pass it. But again, I think the right solution there is not building the escaping into Git, but just passing it through another filter. > > It doesn't seem out of the question to me to have an out-of-the-box > > default for interactive.diffFilter which does some basic escaping (we > > could even implement it inside the perl script for efficiency). > > Yes I read this thread, but I was left unsatisfied because I would > like something out-of-the-box. > Your suggestion might be the best solution then: implement some > default escaping for interactive.diffFilter. Alternatively, I suppose we could just always escape in add--interactive. I'm trying to think of a case where somebody would really want their diffFilter to see the raw bytes (or vice versa, to show raw bytes produced by their filter), but I'm having trouble coming up with one. I.e., something like this would probably help your case without hurting anybody: diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 28b325d754..d44e5ea459 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -714,6 +714,16 @@ sub parse_diff { push @{$hunk[-1]{DISPLAY}}, (@colored ? $colored[$i] : $diff[$i]); } + + foreach my $hunk (@hunk) { + foreach my $line (@{$hunk->{DISPLAY}}) { + # all control chars minus newline and ESC (for color) + if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) { + $hunk->{CONTROLCHARS} = 1; + } + } + } + return @hunk; } @@ -1407,6 +1417,9 @@ sub patch_update_file { if ($hunk[$ix]{TYPE} eq 'hunk') { $other .= ',e'; } + if ($hunk[$ix]->{CONTROLCHARS}) { + print "warning: control characters in hunk have been replaced by '?'\n"; + } for (@{$hunk[$ix]{DISPLAY}}) { print; } I can't help but feel this is the tip of a larger iceberg, though. E.g., what about characters outside of the terminal's correct encoding? Or broken UTF-8 characters? -Peff
Re: Consider escaping special characters like 'less' does
2017-10-15 22:06 GMT+02:00 Jeff King : > Git's diff generation will never do such escaping by default, because it > means creating a patch that cannot be applied to get back the original > content. Indeed this would be a problem. That's where my knowledge of git's source code ends, but in that case, can't the output be discriminated against the command that was executed? Command that outputs an applicable patch -> don't escape Command that outputs a diff to see changes -> escape > There _are_ already options to create diffs that cannot be applied, like > --textconv. So it would be possible to add a similar option for > escaping. But I don't think we really need or want a separate option, > when you can already do one of: > > 1. If your files have special binary characters that are hard to see, > you can use the existing textconv system to do whatever escaping > you like. And then the Git will diff the result of the escaping, > which means you get readable diffs when they change. > > 2. Put the raw output of git's diff through a filter that escapes. We > already do this most of the time by piping through less. The most > noticeable exception is "add --patch". There you can set up a > program to filter as well. There's more information in a recent > thread here: > > > https://public-inbox.org/git/20171012184736.rglkbyryauwuv...@sigill.intra.peff.net/ > > It doesn't seem out of the question to me to have an out-of-the-box > default for interactive.diffFilter which does some basic escaping (we > could even implement it inside the perl script for efficiency). Yes I read this thread, but I was left unsatisfied because I would like something out-of-the-box. Your suggestion might be the best solution then: implement some default escaping for interactive.diffFilter. Joris
Re: Consider escaping special characters like 'less' does
On Sun, Oct 15, 2017 at 06:39:45PM +0200, Joris Valette wrote: > > It's your MUA's fault that you get a ?, the mail didn't contain any. > > Actually, the original mail contained the special BOM sequence but > it's generally invisible. His MUA shows it with a '?', mine doesn't > show anything, neither does Firefox on the mailing list page. > > The question remains: could escaping be done? Git's diff generation will never do such escaping by default, because it means creating a patch that cannot be applied to get back the original content. There _are_ already options to create diffs that cannot be applied, like --textconv. So it would be possible to add a similar option for escaping. But I don't think we really need or want a separate option, when you can already do one of: 1. If your files have special binary characters that are hard to see, you can use the existing textconv system to do whatever escaping you like. And then the Git will diff the result of the escaping, which means you get readable diffs when they change. 2. Put the raw output of git's diff through a filter that escapes. We already do this most of the time by piping through less. The most noticeable exception is "add --patch". There you can set up a program to filter as well. There's more information in a recent thread here: https://public-inbox.org/git/20171012184736.rglkbyryauwuv...@sigill.intra.peff.net/ It doesn't seem out of the question to me to have an out-of-the-box default for interactive.diffFilter which does some basic escaping (we could even implement it inside the perl script for efficiency). -Peff
Re: Consider escaping special characters like 'less' does
2017-10-15 17:46 GMT+02:00 Andreas Schwab : > On Okt 15 2017, "Jason Pyeron" wrote: > >>> -Original Message- >>> From: Joris Valette >>> Sent: Sunday, October 15, 2017 9:34 AM >>> To: git@vger.kernel.org >>> Subject: Consider escaping special characters like 'less' does >>> >>> The pager 'less' escapes some characters when calling 'git diff'. This >>> is what I might get: >>> >>> $ git diff --cached >>> diff --git a/some_file b/some_file >>> new file mode 100644 >>> index 000..357323f >>> --- /dev/null >>> +++ b/some_file >>> @@ -0,0 +1 @@ >>> +Hello >>> \ No newline at end of file >>> >>> This example is a simple file encoded in UTF-8 *with BOM*. >>> On the other hand, the built-in git output shows this: >>> >>> $ git --no-pager diff --cached >>> diff --git a/some_file b/some_file >>> new file mode 100644 >>> index 000..357323f >>> --- /dev/null >>> +++ b/some_file >>> @@ -0,0 +1 @@ >>> +?Hello >>> \ No newline at end of file >> >> It is your terminal, not git's fault that you get a ? rendered. > > It's your MUA's fault that you get a ?, the mail didn't contain any. Actually, the original mail contained the special BOM sequence but it's generally invisible. His MUA shows it with a '?', mine doesn't show anything, neither does Firefox on the mailing list page. The question remains: could escaping be done? Joris
Re: Consider escaping special characters like 'less' does
On Okt 15 2017, "Jason Pyeron" wrote: >> -Original Message- >> From: Joris Valette >> Sent: Sunday, October 15, 2017 9:34 AM >> To: git@vger.kernel.org >> Subject: Consider escaping special characters like 'less' does >> >> The pager 'less' escapes some characters when calling 'git diff'. This >> is what I might get: >> >> $ git diff --cached >> diff --git a/some_file b/some_file >> new file mode 100644 >> index 000..357323f >> --- /dev/null >> +++ b/some_file >> @@ -0,0 +1 @@ >> +Hello >> \ No newline at end of file >> >> This example is a simple file encoded in UTF-8 *with BOM*. >> On the other hand, the built-in git output shows this: >> >> $ git --no-pager diff --cached >> diff --git a/some_file b/some_file >> new file mode 100644 >> index 000..357323f >> --- /dev/null >> +++ b/some_file >> @@ -0,0 +1 @@ >> +?Hello >> \ No newline at end of file > > It is your terminal, not git's fault that you get a ? rendered. It's your MUA's fault that you get a ?, the mail didn't contain any. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
RE: Consider escaping special characters like 'less' does
> -Original Message- > From: Joris Valette > Sent: Sunday, October 15, 2017 9:34 AM > To: git@vger.kernel.org > Subject: Consider escaping special characters like 'less' does > > The pager 'less' escapes some characters when calling 'git diff'. This > is what I might get: > > $ git diff --cached > diff --git a/some_file b/some_file > new file mode 100644 > index 000..357323f > --- /dev/null > +++ b/some_file > @@ -0,0 +1 @@ > +Hello > \ No newline at end of file > > This example is a simple file encoded in UTF-8 *with BOM*. > On the other hand, the built-in git output shows this: > > $ git --no-pager diff --cached > diff --git a/some_file b/some_file > new file mode 100644 > index 000..357323f > --- /dev/null > +++ b/some_file > @@ -0,0 +1 @@ > +?Hello > \ No newline at end of file It is your terminal, not git's fault that you get a ? rendered. $ git diff diff --git a/test.txt b/test.txt index af9f93a..294e397 100644 --- a/test.txt +++ b/test.txt @@ -1 +1 @@ -the quick brown fox jumps over the lazy dog +the quick brown fox jumps over the lazy dog $ git diff | hexdump.exe -C 64 69 66 66 20 2d 2d 67 69 74 20 61 2f 74 65 73 |diff --git a/tes| 0010 74 2e 74 78 74 20 62 2f 74 65 73 74 2e 74 78 74 |t.txt b/test.txt| 0020 0a 69 6e 64 65 78 20 61 66 39 66 39 33 61 2e 2e |.index af9f93a..| 0030 32 39 34 65 33 39 37 20 31 30 30 36 34 34 0a 2d |294e397 100644.-| 0040 2d 2d 20 61 2f 74 65 73 74 2e 74 78 74 0a 2b 2b |-- a/test.txt.++| 0050 2b 20 62 2f 74 65 73 74 2e 74 78 74 0a 40 40 20 |+ b/test.txt.@@ | 0060 2d 31 20 2b 31 20 40 40 0a 2d 74 68 65 20 71 75 |-1 +1 @@.-the qu| 0070 69 63 6b 20 62 72 6f 77 6e 20 66 6f 78 20 6a 75 |ick brown fox ju| 0080 6d 70 73 20 6f 76 65 72 20 74 68 65 20 6c 61 7a |mps over the laz| Note: 0a is newline, 2b is + ef bb bf are the specials added to the file, 74 68 65 20 71 75 is 'the qu' 0090 79 20 64 6f 67 0a 2b ef bb bf 74 68 65 20 71 75 |y dog.+...the qu| 00a0 69 63 6b 20 62 72 6f 77 6e 20 66 6f 78 20 6a 75 |ick brown fox ju| 00b0 6d 70 73 20 6f 76 65 72 20 74 68 65 20 6c 61 7a |mps over the laz| 00c0 79 20 64 6f 67 0a |y dog.| 00c6 $ git diff | cat diff --git a/test.txt b/test.txt index af9f93a..294e397 100644 --- a/test.txt +++ b/test.txt @@ -1 +1 @@ -the quick brown fox jumps over the lazy dog +the quick brown fox jumps over the lazy dog $ git --no-pager diff diff --git a/test.txt b/test.txt index af9f93a..294e397 100644 --- a/test.txt +++ b/test.txt @@ -1 +1 @@ -the quick brown fox jumps over the lazy dog +the quick brown fox jumps over the lazy dog And my UTF-8 capable terminal displays it fine. What do you get when you pipe to hexdump? -Jason
Consider escaping special characters like 'less' does
The pager 'less' escapes some characters when calling 'git diff'. This is what I might get: $ git diff --cached diff --git a/some_file b/some_file new file mode 100644 index 000..357323f --- /dev/null +++ b/some_file @@ -0,0 +1 @@ +Hello \ No newline at end of file This example is a simple file encoded in UTF-8 *with BOM*. On the other hand, the built-in git output shows this: $ git --no-pager diff --cached diff --git a/some_file b/some_file new file mode 100644 index 000..357323f --- /dev/null +++ b/some_file @@ -0,0 +1 @@ +Hello \ No newline at end of file You can see 'less' shows explicitly, while it is implicit and hidden with git's built-in output. Other characters behave the same, like ZERO WIDTH SPACE , and I believe many others. This is particularly annoying when there are other changes on the same line. Here I add ' world!' but also remove the BOM: $ git diff diff --git a/some_file b/some_file index 357323f..6769dd6 100644 --- a/some_file +++ b/some_file @@ -1 +1 @@ -Hello \ No newline at end of file +Hello world! \ No newline at end of file Compare with: $ git --no-pager diff diff --git a/some_file b/some_file index 357323f..6769dd6 100644 --- a/some_file +++ b/some_file @@ -1 +1 @@ -Hello \ No newline at end of file +Hello world! \ No newline at end of file In the first example I can see both changes, in the second example the BOM removal is hidden and I won't notice it because there is indeed another genuine change on the same line. I'll add a third example with CRLF line-terminators: $ git diff diff --git a/some_file b/some_file index 357323f..dfd6895 100644 --- a/some_file +++ b/some_file @@ -1 +1 @@ -Hello \ No newline at end of file +Hello world!^M and: $ git --no-pager diff diff --git a/some_file b/some_file index 357323f..dfd6895 100644 --- a/some_file +++ b/some_file @@ -1 +1 @@ -Hello \ No newline at end of file +Hello world! My thoughts are that git should add this feature and tend towards less's behavior. A diff output should show as much as possible. I would happily use 'less' all the time, but unfortunately I don't think I can. For instance, 'git add --patch', which I use quite often, prints through git's built-in output, and I miss less's features. Joris Valette.