The problem is in strcoll/strxfrm as described in: http://unix.stackexchange.com/questions/17198/where-has-my-uniq-or-sort-u-line-gone-with-some-unicode-characters
$ LANG=en_US.UTF-8 perl -C255 -MPOSIX -le 'print "$_ ", unpack("h*", strxfrm($_)) foreach @ARGV' a b c А В Г Ѯ Ѻ Ѳ a c010801020 b d010801020 c e010801020 А 2cbb10801090 В 2cdb10801090 Г 2ceb10801090 Ѯ 101010102c6b102c6b Ѻ 101010102c6b102c6b Ѳ 101010102c6b102c6b The latin and common cyrillic chars all have different values, but the rare characters all convert to the same collation element. It also does this for Japanese kana, but not kanji. As the link states, it's pretty clearly a bug - the correct behavior would be to sort the unknown characters after all known characters and consider them distinct. As a workaround, adding values for all characters to every locale file in /usr/share/i18n/locales/ should work. -- Alex -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org