Isn't that what the difference between byte-level and codepoint-level access to strings is all about. If you want to work with values that are illegal codepoints then you should be working at the byte-level not the codepoint-level, at least by default.
-- Mark Biggar [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] > On Fri, Apr 15, 2005 at 12:56:14AM -0700, Mark A. Biggar wrote: > : Yes, the value 0xFFFF can be stored as either 3 byte UTF-8 string or a 2 > : byte UCS-2 value, but the Unicode standard specifically says that the > : values 0xFFFF, 0xFFFE and 0xFEFF are NOT valid codepoints and should > : never appear in a Unicode string. 0xFFFF is reserved for out-of-band > : signaling (such the -1 returnd by getc()) and 0xFFFE and 0xFEFF are > : specificaly reserved for out-of-band marking a UCS-2 file as being > : either bigendian or littlendian, but are specifically not considered > : part of the data. chr() is currently defined to mean convert an int > : value to a Unicode codepoint. That's why I said that chr(65535) should > : return an exception, it's an argument error similar to sqrt(-1). > > It has to at least be possible to Think Bad Thoughts in Perl. > It doesn't have to be the default, though. But there has to be > some way of allowing illegal characters to be talked about, or > you can't write programs that talk about them. It's like saying > it's okay to be an executioner as long as you don't kill anyone... > > Larry