Bug#480774: diff -y in UTF-8: bad alignment

2009-09-01 Thread Santiago Vila
On Sat, 30 May 2009, Bruno Haible wrote:

 Vincent Lefevre reported in
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480774:
  When diff -y in run on files that contain multibyte characters (in
  UTF-8), the alignment is incorrect in the output.
 
 This is fixed upstream, in the CVS version of diffutils at
   https://savannah.gnu.org/cvs/?group=diffutils
 
 It is not fixed in version 2.8.7 on alpha.gnu.org.
 
 FYI, OpenSUSE 11 ships with the newest diffutils from the CVS.

Hmm, is that not an indication that diffutils 2.8.1 is old and the
world expects a new diffutils release to happen?

Are there any plans for a diffutils 2.9 anytime soon?

I'd love to do some cleanup of old bugs, there are too many of them
at http://bugs.debian.org/diffutils.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#480774: diff -y in UTF-8: bad alignment

2009-05-29 Thread Bruno Haible
Vincent Lefevre reported in
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480774:
 When diff -y in run on files that contain multibyte characters (in
 UTF-8), the alignment is incorrect in the output.

This is fixed upstream, in the CVS version of diffutils at
  https://savannah.gnu.org/cvs/?group=diffutils

It is not fixed in version 2.8.7 on alpha.gnu.org.

FYI, OpenSUSE 11 ships with the newest diffutils from the CVS.

Thanks for the report.

Bruno



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#480774: diff -y in UTF-8: bad alignment

2008-06-18 Thread Santiago Vila
On Mon, 12 May 2008, Vincent Lefevre wrote:

 Package: diff
 Version: 2.8.1-12
 Severity: normal
 
 When diff -y in run on files that contain multibyte characters (in
 UTF-8), the alignment is incorrect in the output. For instance,
 
   diff -y file1 file2
 
 on the attached files file1 and file2 produces the attached result
 (see out attachment): lines with non-ASCII characters have additional
 spaces.
 
 -- System Information:
 Debian Release: lenny/sid
   APT prefers unstable
   APT policy: (500, 'unstable'), (500, 'stable')
 Architecture: amd64 (x86_64)
 
 Kernel: Linux 2.6.24.5-20080423 (SMP w/2 CPU cores; PREEMPT)
 Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)

Two questions:

Could it be because you are using a locale which is not UTF-8 friendly?
(the one in the line Locale: above).

Does this happen with version 2.8.7 in experimental?



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#480774: diff -y in UTF-8: bad alignment

2008-06-18 Thread Vincent Lefevre
On 2008-06-18 12:39:49 +0200, Santiago Vila wrote:
 Could it be because you are using a locale which is not UTF-8 friendly?
 (the one in the line Locale: above).

No, I use both ISO-8859-1 and UTF-8 locales. I did the test in a
uxterm, with:

LANG=POSIX
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=POSIX
LC_TIME=en_DK
LC_COLLATE=POSIX
LC_MONETARY=POSIX
LC_MESSAGES=POSIX
LC_PAPER=POSIX
LC_NAME=POSIX
LC_ADDRESS=POSIX
LC_TELEPHONE=POSIX
LC_MEASUREMENT=POSIX
LC_IDENTIFICATION=POSIX
LC_ALL=

but reported the bug from an xterm (so, using my default ISO-8859-1
locales).

 Does this happen with version 2.8.7 in experimental?

Yes, it occurs with diff 2.8.7-0.2.

The bug also occurs with:

  LC_ALL=en_US.UTF-8 diff -y file1 file2

To summarize, with the same files, here are the number of spaces before
the pipe character in the output, depending on the locales:

LocalesISO-8859-1UTF-8
Line 1  2  2   (contents: ab345...)
Line 2  1  3   (contents: àb345... in UTF-8)

The behavior is correct under ISO-8859-1 locales (since 'à' encoded
in UTF-8 takes 2 bytes thus is seen as 2 characters in ISO-8859-1).
However, under UTF-8, the number of spaces should be 2 instead of 3
for line 2.

-- 
Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/
100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#480774: diff -y in UTF-8: bad alignment

2008-06-18 Thread Vincent Lefevre
found 480774 2.8.7-0.2
thanks

Also, if I replace the à by a € (euro symbol, which takes 3 bytes
instead of 2 for à), I also get 3 spaces under UTF-8 locales. So,
it seems that the encoding length doesn't matter.

And if I also replace the 'b' by a 'è', then the pipe character no
longer appears in the output!

-- 
Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/
100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)



--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#480774: diff -y in UTF-8: bad alignment

2008-05-11 Thread Vincent Lefevre
Package: diff
Version: 2.8.1-12
Severity: normal

When diff -y in run on files that contain multibyte characters (in
UTF-8), the alignment is incorrect in the output. For instance,

  diff -y file1 file2

on the attached files file1 and file2 produces the attached result
(see out attachment): lines with non-ASCII characters have additional
spaces.

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.24.5-20080423 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages diff depends on:
ii  libc6 2.7-10 GNU C Library: Shared libraries

diff recommends no packages.

-- no debconf information
ab3456789012345678901234567890123456789012345678901234567890
àb3456789012345678901234567890123456789012345678901234567890
ac3456789012345678901234567890123456789012345678901234567890
àc3456789012345678901234567890123456789012345678901234567890
ab3456789012345678901234567890123456789012345678901234567890  | 
ac3456789012345678901234567890123456789012345678901234567890
àb3456789012345678901234567890123456789012345678901234567890   |
àc3456789012345678901234567890123456789012345678901234567890