Re: [PATCH 7/7] diff-highlight: detect --graph by indent
On Thu, Mar 22, 2018 at 10:59:31AM +, Phillip Wood wrote: > > +sub handle_line { > > + my $orig = shift; > > + local $_ = $orig; > > + > > + # match a graph line that begins a commit > > + if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space > > +$COLOR?\*$COLOR?[ ] # a "*" with its trailing space > > + (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|" > > +[ ]* # trailing whitespace for merges > > Hi Peff, thanks for looking at this. I've only had a quick look through > but I wonder if this will be confused by commit messages that contain > * bullet points > * like this I think we should be OK because the commit messages are indented by 4 spaces, and we allow only single spaces amidst the graph-drawing bits (and we respect the "*" only in those graph-drawing bits). So I think you could fool it with something like: git log --graph --format='* oops!' or maybe even: git log --graph --format='%B' but not with a regular git-log format. Those final corner cases I don't think we can overcome; it's just syntactically ambiguous. Unless perhaps we traced the graph lines from the start of the output and knew how many to expect, but that seems rather complicated. Ultimately the best solution would be to avoid this parsing nonsense entirely by teaching Git's internal diff to produce the highlighting internally. -Peff PS I also eyeballed the results of "git log --graph -p --no-merges -1000" piped through the old and new versions. There are several changes, but they all look like improvements.
Re: [PATCH 7/7] diff-highlight: detect --graph by indent
On 21/03/18 05:59, Jeff King wrote: > This patch fixes a corner case where diff-highlight may > scramble some diffs when combined with --graph. > > Commit 7e4ffb4c17 (diff-highlight: add support for --graph > output, 2016-08-29) taught diff-highlight to skip past the > graph characters at the start of each line with this regex: > > ($COLOR?\|$COLOR?\s+)* > > I.e., any series of pipes separated by and followed by > arbitrary whitespace. We need to match more than just a > single space because the commit in question may be indented > to accommodate other parts of the graph drawing. E.g.: > > * commit 1234abcd > | ... > | diff --git ... > > has only a single space, but for the last commit before a > fork: > > | | | > | * | commit 1234abcd > | |/ ... > | | diff --git > > the diff lines have more spaces between the pipes and the > start of the diff. > > However, when we soak up all of those spaces with the > $GRAPH regex, we may accidentally include the leading space > for a context line. That means we may consider the actual > contents of a context line as part of the diff syntax. In > other words, something like this: > >normal context line > -old line > +new line >-this is a context line with a leading dash > > would cause us to see that final context line as a removal > line, and we'd end up showing the hunk in the wrong order: > > normal context line > -old line >-this is a context line with a leading dash > +new line > > Instead, let's a be a little more clever about parsing the > graph. We'll look for the actual "*" line that marks the > start of a commit, and record the indentation we see there. > Then we can skip past that indentation when checking whether > the line is a hunk header, removal, addition, etc. > > There is one tricky thing: the indentation in bytes may be > different for various lines of the graph due to coloring. > E.g., the "*" on a commit line is generally shown without > color, but on the actual diff lines, it will be replaced > with a colorized "|" character, adding several bytes. We > work around this here by counting "visible" bytes. This is > unfortunately a bit more expensive, making us about twice as > slow to handle --graph output. But since this is meant to be > used interactively anyway, it's tolerably fast (and the > non-graph case is unaffected). > > One alternative would be to search for hunk header lines and > use their indentation (since they'd have the same colors as > the diff lines which follow). But that just opens up > different corner cases. If we see: > > | |@@ 1,2 1,3 @@ > > we cannot know if this is a real diff that has been > indented due to the graph, or if it's a context line that > happens to look like a diff header. We can only be sure of > the indent on the "*" lines, since we know those don't > contain arbitrary data (technically the user could include a > bunch of extra indentation via --format, but that's rare > enough to disregard). > > Reported-by: Phillip Wood > Signed-off-by: Jeff King > --- > contrib/diff-highlight/DiffHighlight.pm | 78 +++ > .../diff-highlight/t/t9400-diff-highlight.sh | 28 +++ > 2 files changed, 91 insertions(+), 15 deletions(-) > > diff --git a/contrib/diff-highlight/DiffHighlight.pm > b/contrib/diff-highlight/DiffHighlight.pm > index e07cd5931d..536754583b 100644 > --- a/contrib/diff-highlight/DiffHighlight.pm > +++ b/contrib/diff-highlight/DiffHighlight.pm > @@ -21,34 +21,82 @@ my $RESET = "\x1b[m"; > my $COLOR = qr/\x1b\[[0-9;]*m/; > my $BORING = qr/$COLOR|\s/; > > -# The patch portion of git log -p --graph should only ever have preceding | > and > -# not / or \ as merge history only shows up on the commit line. > -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/; > - > my @removed; > my @added; > my $in_hunk; > +my $graph_indent = 0; > > our $line_cb = sub { print @_ }; > our $flush_cb = sub { local $| = 1 }; > > -sub handle_line { > +# Count the visible width of a string, excluding any terminal color > sequences. > +sub visible_width { > local $_ = shift; > + my $ret = 0; > + while (length) { > + if (s/^$COLOR//) { > + # skip colors > + } elsif (s/^.//) { > + $ret++; > + } > + } > + return $ret; > +} > + > +# Return a substring of $str, omitting $len visible characters from the > +# beginning, where terminal color sequences do not count as visible. > +sub visible_substr { > + my ($str, $len) = @_; > + while ($len > 0) { > + if ($str =~ s/^$COLOR//) { > + next > + } > + $str =~ s/^.//; > + $len--; > + } > + return $str; > +} > + > +sub handle_line { > + my $orig = shift; > + local $_ = $orig; > + > + # match a graph line that begins a commit > + if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space > + $COLOR?
[PATCH 7/7] diff-highlight: detect --graph by indent
This patch fixes a corner case where diff-highlight may scramble some diffs when combined with --graph. Commit 7e4ffb4c17 (diff-highlight: add support for --graph output, 2016-08-29) taught diff-highlight to skip past the graph characters at the start of each line with this regex: ($COLOR?\|$COLOR?\s+)* I.e., any series of pipes separated by and followed by arbitrary whitespace. We need to match more than just a single space because the commit in question may be indented to accommodate other parts of the graph drawing. E.g.: * commit 1234abcd | ... | diff --git ... has only a single space, but for the last commit before a fork: | | | | * | commit 1234abcd | |/ ... | | diff --git the diff lines have more spaces between the pipes and the start of the diff. However, when we soak up all of those spaces with the $GRAPH regex, we may accidentally include the leading space for a context line. That means we may consider the actual contents of a context line as part of the diff syntax. In other words, something like this: normal context line -old line +new line -this is a context line with a leading dash would cause us to see that final context line as a removal line, and we'd end up showing the hunk in the wrong order: normal context line -old line -this is a context line with a leading dash +new line Instead, let's a be a little more clever about parsing the graph. We'll look for the actual "*" line that marks the start of a commit, and record the indentation we see there. Then we can skip past that indentation when checking whether the line is a hunk header, removal, addition, etc. There is one tricky thing: the indentation in bytes may be different for various lines of the graph due to coloring. E.g., the "*" on a commit line is generally shown without color, but on the actual diff lines, it will be replaced with a colorized "|" character, adding several bytes. We work around this here by counting "visible" bytes. This is unfortunately a bit more expensive, making us about twice as slow to handle --graph output. But since this is meant to be used interactively anyway, it's tolerably fast (and the non-graph case is unaffected). One alternative would be to search for hunk header lines and use their indentation (since they'd have the same colors as the diff lines which follow). But that just opens up different corner cases. If we see: | |@@ 1,2 1,3 @@ we cannot know if this is a real diff that has been indented due to the graph, or if it's a context line that happens to look like a diff header. We can only be sure of the indent on the "*" lines, since we know those don't contain arbitrary data (technically the user could include a bunch of extra indentation via --format, but that's rare enough to disregard). Reported-by: Phillip Wood Signed-off-by: Jeff King --- contrib/diff-highlight/DiffHighlight.pm | 78 +++ .../diff-highlight/t/t9400-diff-highlight.sh | 28 +++ 2 files changed, 91 insertions(+), 15 deletions(-) diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm index e07cd5931d..536754583b 100644 --- a/contrib/diff-highlight/DiffHighlight.pm +++ b/contrib/diff-highlight/DiffHighlight.pm @@ -21,34 +21,82 @@ my $RESET = "\x1b[m"; my $COLOR = qr/\x1b\[[0-9;]*m/; my $BORING = qr/$COLOR|\s/; -# The patch portion of git log -p --graph should only ever have preceding | and -# not / or \ as merge history only shows up on the commit line. -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/; - my @removed; my @added; my $in_hunk; +my $graph_indent = 0; our $line_cb = sub { print @_ }; our $flush_cb = sub { local $| = 1 }; -sub handle_line { +# Count the visible width of a string, excluding any terminal color sequences. +sub visible_width { local $_ = shift; + my $ret = 0; + while (length) { + if (s/^$COLOR//) { + # skip colors + } elsif (s/^.//) { + $ret++; + } + } + return $ret; +} + +# Return a substring of $str, omitting $len visible characters from the +# beginning, where terminal color sequences do not count as visible. +sub visible_substr { + my ($str, $len) = @_; + while ($len > 0) { + if ($str =~ s/^$COLOR//) { + next + } + $str =~ s/^.//; + $len--; + } + return $str; +} + +sub handle_line { + my $orig = shift; + local $_ = $orig; + + # match a graph line that begins a commit + if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space +$COLOR?\*$COLOR?[ ] # a "*" with its trailing space + (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|" +[ ]* # trailing whitespace for merges + /x) { + my $graph_prefix = $&; + + # We must flush befor