Re: [PATCH] contrib/diff-highlight: multibyte characters diff
Thanks for reviewing. as you wrote, diff content may not be utf8 at all. and we don't know that the user's terminal watns is utf8. I think your trying utf8 decode and fall back approach is better than my patch, and do work well. is using $@ for catching error like the patch below? According to perldoc Encode.pm, encode/decode with FB_CROAK may destroy original string. We should probabry use LEAVE_SRC on decode_utf8's second argument. --- contrib/diff-highlight/diff-highlight | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..0743851 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -2,6 +2,7 @@ use warnings FATAL = 'all'; use strict; +use Encode qw(decode_utf8 encode_utf8); # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. @@ -73,13 +74,23 @@ sub show_hunk { my @queue; for (my $i = 0; $i @$a; $i++) { - my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]); - print $rm; - push @queue, $add; + my ($a_dec, $encode_rm) = decode($a-[$i]); + my ($b_dec, $encode_add) = decode($b-[$i]); + my ($rm, $add) = highlight_pair($a_dec, $b_dec); + print $encode_rm-($rm); + push @queue, $encode_add-($add); } print @queue; } +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK | Encode::LEAVE_SRC) }; + return $@ ? + ($orig, sub { shift }) : + ($decoded, sub { encode_utf8(shift) }); +} + sub highlight_pair { my @a = split_line(shift); my @b = split_line(shift); -- 1.8.5.3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On 12 February 2014 20:59, Jeff King p...@peff.net wrote: +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; + return defined $decoded ? I'd still advocate checking $@ here, rather than the defined $decoded check. + ($decoded, sub { encode_utf8(shift) }) : + ($orig, sub { shift }); +} + -- Thomas Adam -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: On 12 February 2014 20:59, Jeff King p...@peff.net wrote: +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; + return defined $decoded ? I'd still advocate checking $@ here, rather than the defined $decoded check. I don't mind changing it, but for my edification, what is the advantage? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote: On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: On 12 February 2014 20:59, Jeff King p...@peff.net wrote: +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; + return defined $decoded ? I'd still advocate checking $@ here, rather than the defined $decoded check. I don't mind changing it, but for my edification, what is the advantage? The documentation for decode_utf8 isn't clear, but I don't know if it can ever return undef. What, for example, does it return if $orig is not defined? That's the benefit: it's immediately clear to the user that you're interested in whether it threw an exception, rather than whether it produced a given value. That said, $DAYJOB is a Perl shop, and I would certainly not reject this code in review, and depending on the situation, I might even write something like this. I personally think it's fine. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Thu, Feb 13, 2014 at 01:17:54AM +, brian m. carlson wrote: On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote: On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: On 12 February 2014 20:59, Jeff King p...@peff.net wrote: +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; + return defined $decoded ? I'd still advocate checking $@ here, rather than the defined $decoded check. I don't mind changing it, but for my edification, what is the advantage? The documentation for decode_utf8 isn't clear, but I don't know if it can ever return undef. What, for example, does it return if $orig is not defined? That's the benefit: it's immediately clear to the user that you're interested in whether it threw an exception, rather than whether it produced a given value. I'd argue that I am more interested in whether it returned a value. Let us imagine for a moment that decode_utf8 could return undef without throwing an exception. What should the function return in such a case? I think the only sensible thing is the original (and to indicate that the result was not converted). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] contrib/diff-highlight: multibyte characters diff
Signed-off-by: Yoshihiro Sugi sugi1...@gmail.com diff-highlight split each hunks and compare them as byte sequences. it causes problems when diff hunks include multibyte characters. This change enable to work on such cases by decoding inputs and encoding output as utf8 string. --- contrib/diff-highlight/diff-highlight | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..49b4f53 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -2,6 +2,7 @@ use warnings FATAL = 'all'; use strict; +use Encode qw(decode_utf8 encode_utf8); # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. @@ -15,8 +16,9 @@ my @added; my $in_hunk; while () { + $_ = decode_utf8($_); if (!$in_hunk) { - print; + print encode_utf8($_); $in_hunk = /^$COLOR*\@/; } elsif (/^$COLOR*-/) { @@ -30,7 +32,7 @@ while () { @removed = (); @added = (); - print; + print encode_utf8($_); $in_hunk = /^$COLOR*[\@ ]/; } @@ -58,7 +60,8 @@ sub show_hunk { # If one side is empty, then there is nothing to compare or highlight. if (!@$a || !@$b) { - print @$a, @$b; + print encode_utf8($_) for @$a; + print encode_utf8($_) for @$b; return; } @@ -67,17 +70,18 @@ sub show_hunk { # stupid, and only handle multi-line hunks that remove and add the same # number of lines. if (@$a != @$b) { - print @$a, @$b; + print encode_utf8($_) for @$a; + print encode_utf8($_) for @$b; return; } my @queue; for (my $i = 0; $i @$a; $i++) { my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]); - print $rm; + print encode_utf8($rm); push @queue, $add; } - print @queue; + print encode_utf8($_) for @queue; } sub highlight_pair { -- 1.8.5.3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
Yoshihiro Sugi sugi1...@gmail.com writes: Signed-off-by: Yoshihiro Sugi sugi1...@gmail.com diff-highlight split each hunks and compare them as byte sequences. it causes problems when diff hunks include multibyte characters. This change enable to work on such cases by decoding inputs and encoding output as utf8 string. --- contrib/diff-highlight/diff-highlight | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..49b4f53 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -2,6 +2,7 @@ use warnings FATAL = 'all'; use strict; +use Encode qw(decode_utf8 encode_utf8); # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. @@ -15,8 +16,9 @@ my @added; my $in_hunk; while () { + $_ = decode_utf8($_); if (!$in_hunk) { - print; + print encode_utf8($_); $in_hunk = /^$COLOR*\@/; } elsif (/^$COLOR*-/) { @@ -30,7 +32,7 @@ while () { @removed = (); @added = (); - print; + print encode_utf8($_); $in_hunk = /^$COLOR*[\@ ]/; } @@ -58,7 +60,8 @@ sub show_hunk { # If one side is empty, then there is nothing to compare or highlight. if (!@$a || !@$b) { - print @$a, @$b; + print encode_utf8($_) for @$a; + print encode_utf8($_) for @$b; return; } @@ -67,17 +70,18 @@ sub show_hunk { # stupid, and only handle multi-line hunks that remove and add the same # number of lines. if (@$a != @$b) { - print @$a, @$b; + print encode_utf8($_) for @$a; + print encode_utf8($_) for @$b; return; } my @queue; for (my $i = 0; $i @$a; $i++) { my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]); - print $rm; + print encode_utf8($rm); push @queue, $add; } - print @queue; + print encode_utf8($_) for @queue; } sub highlight_pair { -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html