On Thu, Jun 18, 2015 at 05:23:56PM -0400, Jeff King wrote:

> +# Return the individual diff-able items of the hunk, but with any
> +# diff or color prefix/suffix for each line split out (we assume that the
> +# prefix/suffix for each line will be the same).
> +sub split_hunk {
> +     my ($prefix, $suffix, @r);
> +     foreach my $line (@_) {
> +             $line =~ /^($COLOR*[+-]$COLOR*)(.*)($COLOR*)/
> +                     or die "eh, this is supposed to match everything!";

This isn't quite right. We'll never match the suffix, because it gets
sucked up by the greedy (.*). But making that non-greedy matches
nothing, unless we also add "$" at the end.

_But_, there is still something else weird going on. I replaced this
split:

> +             push @r, split(//, $2), "\n";

with:

  push @r, split(/([[:space:][:punct:]])/, $2), "\n";

which behaves much better. With that, 99a2cfb shows:

  -       <if> (!peel_ref(path, peeled)) {
  -               <is>_annotated = !!<hashcmp>(<sha1>, peeled);
  +       <if> (!peel_ref(path, peeled<.hash>)) {
  +               <is>_annotated = !!<oidcmp>(<oid>, <&>peeled);

The latter half of both lines looks perfect. But what's that weird
highlighting of the initial "if" and "is" on those lines?

It turns out that the colored output we produce is quite odd:

  $ git show --color 99a2cfb | grep peel_ref | cat -A
  ^[[31m-^Iif (!peel_ref(path, peeled)) {^[[m$
  ^[[32m+^[[m^I^[[32mif (!peel_ref(path, peeled.hash)) {^[[m$

For the pre-image, we print the color, the "-", and then the line data.
Makes sense.

For the post-image, we print the color, "+", a reset, then the initial
whitespace, then the color again!

So of course the diff algorithm says "hey, there's this weird color in
here". The original implementation of diff-highlight didn't care,
because it skipped leading whitespace and colors as "boring". But this
one cannot do that. It pulls the strict prefix out of each line (and it
must, because it must get the same prefix for each line of a hunk).

I think git is actually wrong here; it's mingling the ANSI colors and
the actual content. Nobody ever noticed because it looks OK to a human,
and who would be foolish enough to try machine-parsing colorized diff
output?

The fix is:

diff --git a/diff.c b/diff.c
index 87b16d5..a80b5b4 100644
--- a/diff.c
+++ b/diff.c
@@ -501,9 +501,9 @@ static void emit_line_checked(const char *reset,
                emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
        else {
                /* Emit just the prefix, then the rest. */
-               emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+               emit_line_0(ecbdata->opt, set, "", sign, "", 0);
                ws_check_emit(line, len, ecbdata->ws_rule,
-                             ecbdata->opt->file, set, reset, ws);
+                             ecbdata->opt->file, "", reset, ws);
        }
 }
 

But I'm a little worried it may interfere with the way the
whitespace-checker emits colors (e.g., it may emit its own colors for
the leading spaces, and we would need to re-assert our color before
showing the rest of the line). So I think you could also argue that
because of whitespace-highlighting, colorized diffs are fundamentally
going to have colors intermingled with the content and should not be
parsed this way.

All the more reason to try to move this inside diff.c, I guess.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to