Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-13 Thread Yoshihiro Sugi
Thanks for reviewing.
as you wrote, diff content may not be utf8 at all. and we don't know that the 
user's terminal watns is utf8.
I think your trying utf8 decode and fall back approach is better than my patch, 
and do work well.

is using $@ for catching error like the patch below?
According to perldoc Encode.pm, encode/decode with FB_CROAK may destroy 
original string. We should probabry use LEAVE_SRC on decode_utf8's second 
argument.

---
 contrib/diff-highlight/diff-highlight | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index c4404d4..0743851 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -2,6 +2,7 @@
 
 use warnings FATAL = 'all';
 use strict;
+use Encode qw(decode_utf8 encode_utf8);
 
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
@@ -73,13 +74,23 @@ sub show_hunk {
 
my @queue;
for (my $i = 0; $i  @$a; $i++) {
-   my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]);
-   print $rm;
-   push @queue, $add;
+   my ($a_dec, $encode_rm) = decode($a-[$i]);
+   my ($b_dec, $encode_add) = decode($b-[$i]);
+   my ($rm, $add) = highlight_pair($a_dec, $b_dec);
+   print $encode_rm-($rm);
+   push @queue, $encode_add-($add);
}
print @queue;
 }
 
+sub decode {
+   my $orig = shift;
+   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK | 
Encode::LEAVE_SRC) };
+   return $@ ?
+  ($orig, sub { shift }) :
+  ($decoded, sub { encode_utf8(shift) });
+}
+
 sub highlight_pair {
my @a = split_line(shift);
my @b = split_line(shift);
-- 
1.8.5.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Thomas Adam
On 12 February 2014 20:59, Jeff King p...@peff.net wrote:
 +sub decode {
 +   my $orig = shift;
 +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
 +   return defined $decoded ?

I'd still advocate checking $@ here, rather than the defined $decoded check.

 +  ($decoded, sub { encode_utf8(shift) }) :
 +  ($orig, sub { shift });
 +}
 +

-- Thomas Adam
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Jeff King
On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:

 On 12 February 2014 20:59, Jeff King p...@peff.net wrote:
  +sub decode {
  +   my $orig = shift;
  +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
  +   return defined $decoded ?
 
 I'd still advocate checking $@ here, rather than the defined $decoded check.

I don't mind changing it, but for my edification, what is the advantage?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread brian m. carlson
On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote:
 On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:
 
  On 12 February 2014 20:59, Jeff King p...@peff.net wrote:
   +sub decode {
   +   my $orig = shift;
   +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
   +   return defined $decoded ?
  
  I'd still advocate checking $@ here, rather than the defined $decoded check.
 
 I don't mind changing it, but for my edification, what is the advantage?

The documentation for decode_utf8 isn't clear, but I don't know if it
can ever return undef.  What, for example, does it return if $orig is
not defined?  That's the benefit: it's immediately clear to the user
that you're interested in whether it threw an exception, rather than
whether it produced a given value.

That said, $DAYJOB is a Perl shop, and I would certainly not reject this
code in review, and depending on the situation, I might even write
something like this.  I personally think it's fine.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187


signature.asc
Description: Digital signature


Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Jeff King
On Thu, Feb 13, 2014 at 01:17:54AM +, brian m. carlson wrote:

 On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote:
  On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:
  
   On 12 February 2014 20:59, Jeff King p...@peff.net wrote:
+sub decode {
+   my $orig = shift;
+   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
+   return defined $decoded ?
   
   I'd still advocate checking $@ here, rather than the defined $decoded 
   check.
  
  I don't mind changing it, but for my edification, what is the advantage?
 
 The documentation for decode_utf8 isn't clear, but I don't know if it
 can ever return undef.  What, for example, does it return if $orig is
 not defined?  That's the benefit: it's immediately clear to the user
 that you're interested in whether it threw an exception, rather than
 whether it produced a given value.

I'd argue that I am more interested in whether it returned a value. Let
us imagine for a moment that decode_utf8 could return undef without
throwing an exception. What should the function return in such a case?

I think the only sensible thing is the original (and to indicate that
the result was not converted).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-11 Thread Yoshihiro Sugi
Signed-off-by: Yoshihiro Sugi sugi1...@gmail.com

diff-highlight split each hunks and compare them as byte sequences.
it causes problems when diff hunks include multibyte characters.
This change enable to work on such cases by decoding inputs and encoding output 
as utf8 string.
---
 contrib/diff-highlight/diff-highlight | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index c4404d4..49b4f53 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -2,6 +2,7 @@
 
 use warnings FATAL = 'all';
 use strict;
+use Encode qw(decode_utf8 encode_utf8);
 
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
@@ -15,8 +16,9 @@ my @added;
 my $in_hunk;
 
 while () {
+   $_ = decode_utf8($_);
if (!$in_hunk) {
-   print;
+   print encode_utf8($_);
$in_hunk = /^$COLOR*\@/;
}
elsif (/^$COLOR*-/) {
@@ -30,7 +32,7 @@ while () {
@removed = ();
@added = ();
 
-   print;
+   print encode_utf8($_);
$in_hunk = /^$COLOR*[\@ ]/;
}
 
@@ -58,7 +60,8 @@ sub show_hunk {
 
# If one side is empty, then there is nothing to compare or highlight.
if (!@$a || !@$b) {
-   print @$a, @$b;
+   print encode_utf8($_) for @$a;
+   print encode_utf8($_) for @$b;
return;
}
 
@@ -67,17 +70,18 @@ sub show_hunk {
# stupid, and only handle multi-line hunks that remove and add the same
# number of lines.
if (@$a != @$b) {
-   print @$a, @$b;
+   print encode_utf8($_) for @$a;
+   print encode_utf8($_) for @$b;
return;
}
 
my @queue;
for (my $i = 0; $i  @$a; $i++) {
my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]);
-   print $rm;
+   print encode_utf8($rm);
push @queue, $add;
}
-   print @queue;
+   print encode_utf8($_) for @queue;
 }
 
 sub highlight_pair {
-- 
1.8.5.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-11 Thread Junio C Hamano
Yoshihiro Sugi sugi1...@gmail.com writes:

 Signed-off-by: Yoshihiro Sugi sugi1...@gmail.com

 diff-highlight split each hunks and compare them as byte sequences.
 it causes problems when diff hunks include multibyte characters.
 This change enable to work on such cases by decoding inputs and encoding 
 output as utf8 string.
 ---
  contrib/diff-highlight/diff-highlight | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)

 diff --git a/contrib/diff-highlight/diff-highlight 
 b/contrib/diff-highlight/diff-highlight
 index c4404d4..49b4f53 100755
 --- a/contrib/diff-highlight/diff-highlight
 +++ b/contrib/diff-highlight/diff-highlight
 @@ -2,6 +2,7 @@
  
  use warnings FATAL = 'all';
  use strict;
 +use Encode qw(decode_utf8 encode_utf8);
  
  # Highlight by reversing foreground and background. You could do
  # other things like bold or underline if you prefer.
 @@ -15,8 +16,9 @@ my @added;
  my $in_hunk;
  
  while () {
 + $_ = decode_utf8($_);
   if (!$in_hunk) {
 - print;
 + print encode_utf8($_);
   $in_hunk = /^$COLOR*\@/;
   }
   elsif (/^$COLOR*-/) {
 @@ -30,7 +32,7 @@ while () {
   @removed = ();
   @added = ();
  
 - print;
 + print encode_utf8($_);
   $in_hunk = /^$COLOR*[\@ ]/;
   }
  
 @@ -58,7 +60,8 @@ sub show_hunk {
  
   # If one side is empty, then there is nothing to compare or highlight.
   if (!@$a || !@$b) {
 - print @$a, @$b;
 + print encode_utf8($_) for @$a;
 + print encode_utf8($_) for @$b;
   return;
   }
  
 @@ -67,17 +70,18 @@ sub show_hunk {
   # stupid, and only handle multi-line hunks that remove and add the same
   # number of lines.
   if (@$a != @$b) {
 - print @$a, @$b;
 + print encode_utf8($_) for @$a;
 + print encode_utf8($_) for @$b;
   return;
   }
  
   my @queue;
   for (my $i = 0; $i  @$a; $i++) {
   my ($rm, $add) = highlight_pair($a-[$i], $b-[$i]);
 - print $rm;
 + print encode_utf8($rm);
   push @queue, $add;
   }
 - print @queue;
 + print encode_utf8($_) for @queue;
  }
  
  sub highlight_pair {
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html