Am 10.08.2016 um 17:25 schrieb Pádraig Brady:
> On 10/08/16 16:15, Peter Ludikovsky wrote:
>>
>>
>> Am 10.08.2016 um 16:51 schrieb Pádraig Brady:
>>> On 10/08/16 15:21, Peter Ludikovsky wrote:
>>>> Package: coreutils
>>>> Version: 8.23-4
>>>> Severity: normal
>>>>
>>>> Dear Maintainer,
>>>>
>>>> This came up due to a posting on debian-user-german [1]. Apparently
>>>> certain Unicode characters, at least LEFT-TO-RIGHT EMBEDDING [2] and
>>>> RIGHT-TO-LEFT EMBEDDING [3] do not trigger the escape code display for
>>>> ls with the -b option.
>>>>
>>>> An example script is attached, output:
>>>>
>>>>     $ bash unicode_bidir_test.sh 
>>>>     + touch LTR
>>>>     + touch RTL
>>>>     + /bin/ls -l
>>>>     total 4
>>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 LTR
>>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 RTL
>>>>     -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
>>>>     + /bin/ls -lb
>>>>     total 4
>>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 LTR
>>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 RTL
>>>>     -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
>>>>     + /bin/ls -lb LTR
>>>>     /bin/ls: cannot access LTR: No such file or directory
>>>>     + /bin/ls -lb LTR
>>>>     -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR
>>>>     + /bin/ls -lb RTL
>>>>     /bin/ls: cannot access RTL: No such file or directory
>>>>     + /bin/ls -lb RTL
>>>>     -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL
>>>>
>>>> The expected output would be that those characters be shown, as they are
>>>> relevant when accessing a file on the command line.
>>>>
>>>> [1] https://lists.debian.org/debian-user-german/2016/08/msg00049.html
>>>> [2] http://www.fileformat.info/info/unicode/char/202a/index.htm
>>>> [3] http://www.fileformat.info/info/unicode/char/202b/index.htm
>>>>
>>>> -- System Information:
>>>> Debian Release: 8.5
>>>>   APT prefers stable-updates
>>>>   APT policy: (500, 'stable-updates'), (500, 'stable')
>>>> Architecture: amd64 (x86_64)
>>>>
>>>> Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
>>>> Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
>>>> Shell: /bin/sh linked to /bin/dash
>>>> Init: systemd (via /run/systemd/system)
>>>>
>>>> Versions of packages coreutils depends on:
>>>> ii  libacl1      2.2.52-2
>>>> ii  libattr1     1:2.4.47-2
>>>> ii  libc6        2.19-18+deb8u4
>>>> ii  libselinux1  2.3-2
>>>>
>>>> coreutils recommends no packages.
>>>>
>>>> coreutils suggests no packages.
>>>>
>>>> -- no debconf information
>>>
>>> Is your locale really "C" ?
>>> With mine set to "C" I get:
>>>
>>> $ LANG=C ls -l
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???LTR
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???RTL
>>>
>>> $ LANG=C ls -lb
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\252LTR
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\253RTL
>>>
>>>
>>> With the new quoting in version 8.25 you'll get a directly
>>> copy and pasteable representation like:
>>>
>>> $ LANG=C ~/git/coreutils/src/ls -l
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\252''LTR'
>>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\253''RTL'
>>>
>>>
>>> I'll need to experiment a bit with non "C" locale handling,
>>> and with various terminals, to see how best to handle this case.
>>>
>>> thanks,
>>> Pádraig
>>>
>>
>> Not really, I haven't set any locale on my servers intentionally. Or
>> rather, left it at the "POSIX"(?) default during d-i.
>>     $ localectl status
>>        System Locale: n/a
>>
>>            VC Keymap: n/a
>>           X11 Layout: de
>>            X11 Model: pc105
>>          X11 Variant: nodeadkeys
>>     $ cat /etc/default/locale
>>     #LANG="C"
>>     $ env | grep LANG
>>     $ env | grep LC_
>>     $
>>
>> With both LC_ALL=C and LANG=C it shows at least some indication that
>> there are other characters. But why not when no explicit locale has been
>> set?
> 
> Maybe because it's UTF8 based?
> I also noticed that in gnome-terminal you can copy/paste the hidden chars
> by also selecting the leading space on the file name (though that's certainly 
> not obvious).
> xterm gives a visual indication of an extra char, and allows selecting it.
> So there is an overlap here with terminal handling of the RTL chars
> 

Same with xfce4-terminal. However, without any visual indication in the
first place, who'll try to copy/paste in order to find out _which_
characters there are.

Also, when copying out, leafpad and gedit don't display the characters,
while (g)vim does.

I generated some locales to test, behavior isn't consistent across UTF8
locales:
    $ LANG=de_AT.ISO-8859-1 /bin/ls -lb
    insgesamt 4
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 �\200�LTR
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 �\200�RTL
    -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
    $ LANG=de_AT.UTF8 /bin/ls -lb
    insgesamt 4
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 ‪LTR
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 ‫RTL
    -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
    $ LANG=C /bin/ls -lb
    total 4
    -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 \342\200\252LTR
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 \342\200\253RTL
    $ LANG=C.UTF8 /bin/ls -lb
    total 4
    -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 \342\200\252LTR
    -rw-r--r--. 1 peter peter   0 Aug 10 15:11 \342\200\253RTL

In case it doesn't display right:
 * First output contains 2 unprintable characters enclosing \200 at the
beginning of the filename
 * Second output contains the unicode characters, at least when viewed
in vim
 * Third and Fourth contain the sequence \342\200\252

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to