Hello. Received this report from the Debian bug system. I initially believed this to be a duplicate of Debian Bug#633978, but it's not.
Here is a way to reproduce it, provided by the submitter after the initial report: -------------------------------------------------------- Here are a few command you may use to reproduce the bug mkdir d1 d2 echo azerty > "d1/エンドカード1" echo qsdfgh > "d2/ブックレット1" If the bug is present, diff will return > LANG=some_non_asian_LOCALE.utf8 diff -r d1 d2 1c1 < azerty --- > qsdfgh if the bug is not present you will have something like : > LANG=C diff -r d1 d2 Only in d1: エンドカード1 Only in d2: ブックレット1 -------------------------------------------------------- I can also reproduce it with diffutils 3.3, this is the output in such case: diff "d1/\343\202\250\343\203\263\343\203\211\343\202\253\343\203\274\343\203\2111" "d2/\343\203\226\343\203\203\343\202\257\343\203\254\343\203\203\343\203\2101" 1c1 < azerty --- > qsdfgh Follows the initial report: ---------- Forwarded message ---------- From: Philippe Errembault To: Debian Bug Tracking System <[email protected]> Date: Fri, 29 Mar 2013 03:10:46 +0100 Subject: Bug#704182: diffutils: Diff -r will confusion between asian characters in filenames, when locale are non asian - UTF-8. Package: diffutils Version: 1:3.0-1 Severity: normal I don't know if this bug is caused by diff or by strcoll. When comparing filenames with strcoll, using non asian utf8 locales, chinese characters are considered identical, whichs lead to confusion between files which are differents. E.g.: if you diff -r two directories with files in different orders, because they where on different file systems, written with different OS. For an example, I wanted to diff a copy on a server, of a directory from an NTFS disk. or simply because the files lists are not the same, and the sort happens differently. then, diff may consider as two different files as being the same, and report differences because it compares different files. for examples, in my situation, it believed that "エンドカード1.jpg" and "ブックレット1.jpg" were files with the same name and reported errors between them. The point, is that, I don't know if it is or not normal that strcoll("エンドカード1.jpg", "ブックレット1.jpg"); returns 0 when locale is anything_non_asian.utf-8 [...]
