W liście z pon, 16-08-2004, godz. 15:29 +0300, Jarkko Hietaniemi
napisał:

> > Filenames should be assumed to use the locale's encoding by default,
> 
> Which is wrong, too.  In Win32 and Mac OS X filenames are often Unicode,

If the filename API of a particular platform expects Unicode, then Perl
should of course convert a filename to Unicode, expressed in whatever
form the OS wants (e.g. UTF-16 on WinNT).

On Linux the filename API expects a byte encoding. Technically it
doesn't interpret the bytes (other than '/', '\0' and '.') if the
underlying filesystem is ext2, but it does recode them e.g. for FAT.
A mount option specifies the encoding to use in the Linux API. It should
better be consistent with the locale, because almost all programs
interpret filenames according to the current locale, since most of them
actually don't recode them at all.

> > If Perl scalars are a mixture of ISO-8859-1 and UTF-8, instead of a
> > mixture of the default locale encoding and UTF-8, how to tell Perl to
> > recode external strings (default I/O, including stdin/stdout/stderr,
> > @ARGV, filenames) between the default locale encoding and Perl's
> > internal encodings?
> 
> I would really appreciate if people would run perluniintro, and
> perlrun/-C, but I have already given up the hope.

It only works for UTF-8, not for other encodings (perl-5.8.4):

$ perl -C -e 'print chr(0x104), "\n"'
Wide character in print at -e line 1.
[gibberish output, UTF-8-reinterpreted-as-ISO-8859-2]

The default encoding (of the locale and of the terminal) is ISO-8859-2,
which *is* capable of representing U+0104.

$ perl -C -e 'printf "U+%04X\n", ord($ARGV[0])' Ą
U+00A1

Again, it's not U+00A1 (inverted exclamation mark), but U+0104
(latin capital letter A with ogonek), which is encoded in ISO-8859-2
as 0xA1.

In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1,
while others treat is as whatever is expected by default in files and
filenames and commandline (the locale tells what it is). It should be
decided one way or the other, otherwise generic code doesn't know how to
interpret Perl scalars it encounters.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

Reply via email to