Bug#480774: diff -y in UTF-8: bad alignment
On Sat, 30 May 2009, Bruno Haible wrote: Vincent Lefevre reported in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480774: When diff -y in run on files that contain multibyte characters (in UTF-8), the alignment is incorrect in the output. This is fixed upstream, in the CVS version of diffutils at https://savannah.gnu.org/cvs/?group=diffutils It is not fixed in version 2.8.7 on alpha.gnu.org. FYI, OpenSUSE 11 ships with the newest diffutils from the CVS. Hmm, is that not an indication that diffutils 2.8.1 is old and the world expects a new diffutils release to happen? Are there any plans for a diffutils 2.9 anytime soon? I'd love to do some cleanup of old bugs, there are too many of them at http://bugs.debian.org/diffutils. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#480774: diff -y in UTF-8: bad alignment
Vincent Lefevre reported in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480774: When diff -y in run on files that contain multibyte characters (in UTF-8), the alignment is incorrect in the output. This is fixed upstream, in the CVS version of diffutils at https://savannah.gnu.org/cvs/?group=diffutils It is not fixed in version 2.8.7 on alpha.gnu.org. FYI, OpenSUSE 11 ships with the newest diffutils from the CVS. Thanks for the report. Bruno -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#480774: diff -y in UTF-8: bad alignment
On Mon, 12 May 2008, Vincent Lefevre wrote: Package: diff Version: 2.8.1-12 Severity: normal When diff -y in run on files that contain multibyte characters (in UTF-8), the alignment is incorrect in the output. For instance, diff -y file1 file2 on the attached files file1 and file2 produces the attached result (see out attachment): lines with non-ASCII characters have additional spaces. -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.24.5-20080423 (SMP w/2 CPU cores; PREEMPT) Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1) Two questions: Could it be because you are using a locale which is not UTF-8 friendly? (the one in the line Locale: above). Does this happen with version 2.8.7 in experimental? -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#480774: diff -y in UTF-8: bad alignment
On 2008-06-18 12:39:49 +0200, Santiago Vila wrote: Could it be because you are using a locale which is not UTF-8 friendly? (the one in the line Locale: above). No, I use both ISO-8859-1 and UTF-8 locales. I did the test in a uxterm, with: LANG=POSIX LC_CTYPE=en_US.UTF-8 LC_NUMERIC=POSIX LC_TIME=en_DK LC_COLLATE=POSIX LC_MONETARY=POSIX LC_MESSAGES=POSIX LC_PAPER=POSIX LC_NAME=POSIX LC_ADDRESS=POSIX LC_TELEPHONE=POSIX LC_MEASUREMENT=POSIX LC_IDENTIFICATION=POSIX LC_ALL= but reported the bug from an xterm (so, using my default ISO-8859-1 locales). Does this happen with version 2.8.7 in experimental? Yes, it occurs with diff 2.8.7-0.2. The bug also occurs with: LC_ALL=en_US.UTF-8 diff -y file1 file2 To summarize, with the same files, here are the number of spaces before the pipe character in the output, depending on the locales: LocalesISO-8859-1UTF-8 Line 1 2 2 (contents: ab345...) Line 2 1 3 (contents: àb345... in UTF-8) The behavior is correct under ISO-8859-1 locales (since 'à' encoded in UTF-8 takes 2 bytes thus is seen as 2 characters in ISO-8859-1). However, under UTF-8, the number of spaces should be 2 instead of 3 for line 2. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#480774: diff -y in UTF-8: bad alignment
found 480774 2.8.7-0.2 thanks Also, if I replace the à by a € (euro symbol, which takes 3 bytes instead of 2 for à), I also get 3 spaces under UTF-8 locales. So, it seems that the encoding length doesn't matter. And if I also replace the 'b' by a 'è', then the pipe character no longer appears in the output! -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#480774: diff -y in UTF-8: bad alignment
Package: diff Version: 2.8.1-12 Severity: normal When diff -y in run on files that contain multibyte characters (in UTF-8), the alignment is incorrect in the output. For instance, diff -y file1 file2 on the attached files file1 and file2 produces the attached result (see out attachment): lines with non-ASCII characters have additional spaces. -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.24.5-20080423 (SMP w/2 CPU cores; PREEMPT) Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Versions of packages diff depends on: ii libc6 2.7-10 GNU C Library: Shared libraries diff recommends no packages. -- no debconf information ab3456789012345678901234567890123456789012345678901234567890 àb3456789012345678901234567890123456789012345678901234567890 ac3456789012345678901234567890123456789012345678901234567890 àc3456789012345678901234567890123456789012345678901234567890 ab3456789012345678901234567890123456789012345678901234567890 | ac3456789012345678901234567890123456789012345678901234567890 àb3456789012345678901234567890123456789012345678901234567890 | àc3456789012345678901234567890123456789012345678901234567890