W liście z pon, 16-08-2004, godz. 15:29 +0300, Jarkko Hietaniemi napisał:
> > Filenames should be assumed to use the locale's encoding by default, > > Which is wrong, too. In Win32 and Mac OS X filenames are often Unicode, If the filename API of a particular platform expects Unicode, then Perl should of course convert a filename to Unicode, expressed in whatever form the OS wants (e.g. UTF-16 on WinNT). On Linux the filename API expects a byte encoding. Technically it doesn't interpret the bytes (other than '/', '\0' and '.') if the underlying filesystem is ext2, but it does recode them e.g. for FAT. A mount option specifies the encoding to use in the Linux API. It should better be consistent with the locale, because almost all programs interpret filenames according to the current locale, since most of them actually don't recode them at all. > > If Perl scalars are a mixture of ISO-8859-1 and UTF-8, instead of a > > mixture of the default locale encoding and UTF-8, how to tell Perl to > > recode external strings (default I/O, including stdin/stdout/stderr, > > @ARGV, filenames) between the default locale encoding and Perl's > > internal encodings? > > I would really appreciate if people would run perluniintro, and > perlrun/-C, but I have already given up the hope. It only works for UTF-8, not for other encodings (perl-5.8.4): $ perl -C -e 'print chr(0x104), "\n"' Wide character in print at -e line 1. [gibberish output, UTF-8-reinterpreted-as-ISO-8859-2] The default encoding (of the locale and of the terminal) is ISO-8859-2, which *is* capable of representing U+0104. $ perl -C -e 'printf "U+%04X\n", ord($ARGV[0])' Ą U+00A1 Again, it's not U+00A1 (inverted exclamation mark), but U+0104 (latin capital letter A with ogonek), which is encoded in ISO-8859-2 as 0xA1. In summary, some parts of Perl treat non-UTF-8 scalars as ISO-8859-1, while others treat is as whatever is expected by default in files and filenames and commandline (the locale tells what it is). It should be decided one way or the other, otherwise generic code doesn't know how to interpret Perl scalars it encounters. -- __("< Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/