[EMAIL PROTECTED] writes:
: Followup to:  <[EMAIL PROTECTED]>
: By author:    Jarkko Hietaniemi <[EMAIL PROTECTED]>
: In newsgroup: linux.utf8
: >
: > Note that for most of the time, the difference whether chr() generates
: > ISO 8859-1 or UTF-8 encoded Unicode for the range 0x80..0xff shouldn't
: > matter, since the upgrading of the 8-bit to UTF-8 is automatic.
: > 
: 
: *UNICODE* or *UTF-8*?

An ordinary string is already considered to be Unicode, even if it's
represented internally in ISO-8859-1 bytes.  But by representing such
strings in ISO-8859-1 until an upgrade is forced, we maintain some
degree of compatibility with old binary applications that assume char
is 8 bits.

Perl 6 will likely allow strings to be marked as to their character
set, in which case they will automatically be converted to unmarked
Unicode strings as necessary.

: If what chr() returns is a string encoded in Unicode, of which UTF-8
: is an encoding, that is one thing.  If:
: 
: print FILE chr(0xc0);
: 
: ... prints a naked 0xc0 byte (instead of 0xc3 0x80) but
: 
: print FILE chr(0x1c0);
: 
: ... prints 0xc7 0x80, then that is a serious case of braindamage.

The filehandle will know when it's producing UTF-8, and the conversion
will be automatically applied.

Larry
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to