On 21/03/18 05:59, Jeff King wrote:
> This patch fixes a corner case where diff-highlight may
> scramble some diffs when combined with --graph.
>
> Commit 7e4ffb4c17 (diff-highlight: add support for --graph
> output, 2016-08-29) taught diff-highlight to skip past the
> graph characters at the start of each line with this regex:
>
> ($COLOR?\|$COLOR?\s+)*
>
> I.e., any series of pipes separated by and followed by
> arbitrary whitespace. We need to match more than just a
> single space because the commit in question may be indented
> to accommodate other parts of the graph drawing. E.g.:
>
> * commit 1234abcd
> | ...
> | diff --git ...
>
> has only a single space, but for the last commit before a
> fork:
>
> | | |
> | * | commit 1234abcd
> | |/ ...
> | | diff --git
>
> the diff lines have more spaces between the pipes and the
> start of the diff.
>
> However, when we soak up all of those spaces with the
> $GRAPH regex, we may accidentally include the leading space
> for a context line. That means we may consider the actual
> contents of a context line as part of the diff syntax. In
> other words, something like this:
>
> normal context line
> -old line
> +new line
> -this is a context line with a leading dash
>
> would cause us to see that final context line as a removal
> line, and we'd end up showing the hunk in the wrong order:
>
> normal context line
> -old line
> -this is a context line with a leading dash
> +new line
>
> Instead, let's a be a little more clever about parsing the
> graph. We'll look for the actual "*" line that marks the
> start of a commit, and record the indentation we see there.
> Then we can skip past that indentation when checking whether
> the line is a hunk header, removal, addition, etc.
>
> There is one tricky thing: the indentation in bytes may be
> different for various lines of the graph due to coloring.
> E.g., the "*" on a commit line is generally shown without
> color, but on the actual diff lines, it will be replaced
> with a colorized "|" character, adding several bytes. We
> work around this here by counting "visible" bytes. This is
> unfortunately a bit more expensive, making us about twice as
> slow to handle --graph output. But since this is meant to be
> used interactively anyway, it's tolerably fast (and the
> non-graph case is unaffected).
>
> One alternative would be to search for hunk header lines and
> use their indentation (since they'd have the same colors as
> the diff lines which follow). But that just opens up
> different corner cases. If we see:
>
> | | @@ 1,2 1,3 @@
>
> we cannot know if this is a real diff that has been
> indented due to the graph, or if it's a context line that
> happens to look like a diff header. We can only be sure of
> the indent on the "*" lines, since we know those don't
> contain arbitrary data (technically the user could include a
> bunch of extra indentation via --format, but that's rare
> enough to disregard).
>
> Reported-by: Phillip Wood <[email protected]>
> Signed-off-by: Jeff King <[email protected]>
> ---
> contrib/diff-highlight/DiffHighlight.pm | 78 +++++++++++++++----
> .../diff-highlight/t/t9400-diff-highlight.sh | 28 +++++++
> 2 files changed, 91 insertions(+), 15 deletions(-)
>
> diff --git a/contrib/diff-highlight/DiffHighlight.pm
> b/contrib/diff-highlight/DiffHighlight.pm
> index e07cd5931d..536754583b 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
> my $COLOR = qr/\x1b\[[0-9;]*m/;
> my $BORING = qr/$COLOR|\s/;
>
> -# The patch portion of git log -p --graph should only ever have preceding |
> and
> -# not / or \ as merge history only shows up on the commit line.
> -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
> -
> my @removed;
> my @added;
> my $in_hunk;
> +my $graph_indent = 0;
>
> our $line_cb = sub { print @_ };
> our $flush_cb = sub { local $| = 1 };
>
> -sub handle_line {
> +# Count the visible width of a string, excluding any terminal color
> sequences.
> +sub visible_width {
> local $_ = shift;
> + my $ret = 0;
> + while (length) {
> + if (s/^$COLOR//) {
> + # skip colors
> + } elsif (s/^.//) {
> + $ret++;
> + }
> + }
> + return $ret;
> +}
> +
> +# Return a substring of $str, omitting $len visible characters from the
> +# beginning, where terminal color sequences do not count as visible.
> +sub visible_substr {
> + my ($str, $len) = @_;
> + while ($len > 0) {
> + if ($str =~ s/^$COLOR//) {
> + next
> + }
> + $str =~ s/^.//;
> + $len--;
> + }
> + return $str;
> +}
> +
> +sub handle_line {
> + my $orig = shift;
> + local $_ = $orig;
> +
> + # match a graph line that begins a commit
> + if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
> + $COLOR?\*$COLOR?[ ] # a "*" with its trailing space
> + (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
> + [ ]* # trailing whitespace for merges
Hi Peff, thanks for looking at this. I've only had a quick look through
but I wonder if this will be confused by commit messages that contain
* bullet points
* like this
Best Wishes
Phillip
> + /x) {
> + my $graph_prefix = $&;
> +
> + # We must flush before setting graph indent, since the
> + # new commit may be indented differently from what we
> + # queued.
> + flush();
> + $graph_indent = visible_width($graph_prefix);
> +
> + } elsif ($graph_indent) {
> + if (length($_) < $graph_indent) {
> + $graph_indent = 0;
> + } else {
> + $_ = visible_substr($_, $graph_indent);
> + }
> + }
>
> if (!$in_hunk) {
> - $line_cb->($_);
> - $in_hunk = /^$GRAPH*$COLOR*\@\@ /;
> + $line_cb->($orig);
> + $in_hunk = /^$COLOR*\@\@ /;
> }
> - elsif (/^$GRAPH*$COLOR*-/) {
> - push @removed, $_;
> + elsif (/^$COLOR*-/) {
> + push @removed, $orig;
> }
> - elsif (/^$GRAPH*$COLOR*\+/) {
> - push @added, $_;
> + elsif (/^$COLOR*\+/) {
> + push @added, $orig;
> }
> else {
> flush();
> - $line_cb->($_);
> - $in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
> + $line_cb->($orig);
> + $in_hunk = /^$COLOR*[\@ ]/;
> }
>
> # Most of the time there is enough output to keep things streaming,
> @@ -225,8 +273,8 @@ sub is_pair_interesting {
> my $suffix_a = join('', @$a[($sa+1)..$#$a]);
> my $suffix_b = join('', @$b[($sb+1)..$#$b]);
>
> - return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
> - $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
> + return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/
> ||
> + visible_substr($prefix_b, $graph_indent) !~
> /^$COLOR*\+$BORING*$/ ||
> $suffix_a !~ /^$BORING*$/ ||
> $suffix_b !~ /^$BORING*$/;
> }
> diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh
> b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> index bf0c270d60..f6f5195d00 100755
> --- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
> +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> @@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined
> diffs' '
> test_cmp expect actual
> '
>
> +test_expect_success 'diff-highlight handles --graph with leading dash' '
> + cat >file <<-\EOF &&
> + before
> + the old line
> + -leading dash
> + EOF
> + git add file &&
> + git commit -m before &&
> +
> + sed s/old/new/ <file >file.tmp &&
> + mv file.tmp file &&
> + git add file &&
> + git commit -m after &&
> +
> + cat >expect <<-EOF &&
> + --- a/file
> + +++ b/file
> + @@ -1,3 +1,3 @@
> + before
> + -the ${CW}old${CR} line
> + +the ${CW}new${CR} line
> + -leading dash
> + EOF
> + git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
> + trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
> + test_cmp expect actual
> +'
> +
> test_done
>