[EMAIL PROTECTED] writes: : Followup to: <[EMAIL PROTECTED]> : By author: Jarkko Hietaniemi <[EMAIL PROTECTED]> : In newsgroup: linux.utf8 : > : > Note that for most of the time, the difference whether chr() generates : > ISO 8859-1 or UTF-8 encoded Unicode for the range 0x80..0xff shouldn't : > matter, since the upgrading of the 8-bit to UTF-8 is automatic. : > : : *UNICODE* or *UTF-8*?
An ordinary string is already considered to be Unicode, even if it's represented internally in ISO-8859-1 bytes. But by representing such strings in ISO-8859-1 until an upgrade is forced, we maintain some degree of compatibility with old binary applications that assume char is 8 bits. Perl 6 will likely allow strings to be marked as to their character set, in which case they will automatically be converted to unmarked Unicode strings as necessary. : If what chr() returns is a string encoded in Unicode, of which UTF-8 : is an encoding, that is one thing. If: : : print FILE chr(0xc0); : : ... prints a naked 0xc0 byte (instead of 0xc3 0x80) but : : print FILE chr(0x1c0); : : ... prints 0xc7 0x80, then that is a serious case of braindamage. The filehandle will know when it's producing UTF-8, and the conversion will be automatically applied. Larry -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/