Am 10.08.2016 um 17:25 schrieb Pádraig Brady: > On 10/08/16 16:15, Peter Ludikovsky wrote: >> >> >> Am 10.08.2016 um 16:51 schrieb Pádraig Brady: >>> On 10/08/16 15:21, Peter Ludikovsky wrote: >>>> Package: coreutils >>>> Version: 8.23-4 >>>> Severity: normal >>>> >>>> Dear Maintainer, >>>> >>>> This came up due to a posting on debian-user-german [1]. Apparently >>>> certain Unicode characters, at least LEFT-TO-RIGHT EMBEDDING [2] and >>>> RIGHT-TO-LEFT EMBEDDING [3] do not trigger the escape code display for >>>> ls with the -b option. >>>> >>>> An example script is attached, output: >>>> >>>> $ bash unicode_bidir_test.sh >>>> + touch LTR >>>> + touch RTL >>>> + /bin/ls -l >>>> total 4 >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>>> -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh >>>> + /bin/ls -lb >>>> total 4 >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>>> -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh >>>> + /bin/ls -lb LTR >>>> /bin/ls: cannot access LTR: No such file or directory >>>> + /bin/ls -lb LTR >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>>> + /bin/ls -lb RTL >>>> /bin/ls: cannot access RTL: No such file or directory >>>> + /bin/ls -lb RTL >>>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>>> >>>> The expected output would be that those characters be shown, as they are >>>> relevant when accessing a file on the command line. >>>> >>>> [1] https://lists.debian.org/debian-user-german/2016/08/msg00049.html >>>> [2] http://www.fileformat.info/info/unicode/char/202a/index.htm >>>> [3] http://www.fileformat.info/info/unicode/char/202b/index.htm >>>> >>>> -- System Information: >>>> Debian Release: 8.5 >>>> APT prefers stable-updates >>>> APT policy: (500, 'stable-updates'), (500, 'stable') >>>> Architecture: amd64 (x86_64) >>>> >>>> Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core) >>>> Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) >>>> Shell: /bin/sh linked to /bin/dash >>>> Init: systemd (via /run/systemd/system) >>>> >>>> Versions of packages coreutils depends on: >>>> ii libacl1 2.2.52-2 >>>> ii libattr1 1:2.4.47-2 >>>> ii libc6 2.19-18+deb8u4 >>>> ii libselinux1 2.3-2 >>>> >>>> coreutils recommends no packages. >>>> >>>> coreutils suggests no packages. >>>> >>>> -- no debconf information >>> >>> Is your locale really "C" ? >>> With mine set to "C" I get: >>> >>> $ LANG=C ls -l >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???LTR >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???RTL >>> >>> $ LANG=C ls -lb >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\252LTR >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\253RTL >>> >>> >>> With the new quoting in version 8.25 you'll get a directly >>> copy and pasteable representation like: >>> >>> $ LANG=C ~/git/coreutils/src/ls -l >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\252''LTR' >>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\253''RTL' >>> >>> >>> I'll need to experiment a bit with non "C" locale handling, >>> and with various terminals, to see how best to handle this case. >>> >>> thanks, >>> Pádraig >>> >> >> Not really, I haven't set any locale on my servers intentionally. Or >> rather, left it at the "POSIX"(?) default during d-i. >> $ localectl status >> System Locale: n/a >> >> VC Keymap: n/a >> X11 Layout: de >> X11 Model: pc105 >> X11 Variant: nodeadkeys >> $ cat /etc/default/locale >> #LANG="C" >> $ env | grep LANG >> $ env | grep LC_ >> $ >> >> With both LC_ALL=C and LANG=C it shows at least some indication that >> there are other characters. But why not when no explicit locale has been >> set? > > Maybe because it's UTF8 based? > I also noticed that in gnome-terminal you can copy/paste the hidden chars > by also selecting the leading space on the file name (though that's certainly > not obvious). > xterm gives a visual indication of an extra char, and allows selecting it. > So there is an overlap here with terminal handling of the RTL chars >
Same with xfce4-terminal. However, without any visual indication in the first place, who'll try to copy/paste in order to find out _which_ characters there are. Also, when copying out, leafpad and gedit don't display the characters, while (g)vim does. I generated some locales to test, behavior isn't consistent across UTF8 locales: $ LANG=de_AT.ISO-8859-1 /bin/ls -lb insgesamt 4 -rw-r--r--. 1 peter peter 0 Aug 10 15:11 �\200�LTR -rw-r--r--. 1 peter peter 0 Aug 10 15:11 �\200�RTL -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh $ LANG=de_AT.UTF8 /bin/ls -lb insgesamt 4 -rw-r--r--. 1 peter peter 0 Aug 10 15:11 LTR -rw-r--r--. 1 peter peter 0 Aug 10 15:11 RTL -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh $ LANG=C /bin/ls -lb total 4 -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh -rw-r--r--. 1 peter peter 0 Aug 10 15:11 \342\200\252LTR -rw-r--r--. 1 peter peter 0 Aug 10 15:11 \342\200\253RTL $ LANG=C.UTF8 /bin/ls -lb total 4 -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh -rw-r--r--. 1 peter peter 0 Aug 10 15:11 \342\200\252LTR -rw-r--r--. 1 peter peter 0 Aug 10 15:11 \342\200\253RTL In case it doesn't display right: * First output contains 2 unprintable characters enclosing \200 at the beginning of the filename * Second output contains the unicode characters, at least when viewed in vim * Third and Fourth contain the sequence \342\200\252
signature.asc
Description: OpenPGP digital signature