Another possibility is to use a UTF-8 extended system where you use values over 0x10FFFF to encode temporary code block swaps in the encoding. I.e., some magic value means the one byte UTF-8 codes now mean the Greek block instead of the ASCII block. But you would need broad agreement for that to work. As Dan said this really need a separation between encoding and character set.
-- Mark Biggar [EMAIL PROTECTED] > At 12:28 AM +0100 3/16/04, Karl Brodowsky wrote: > >Anyway, it will be necessary to specify the encoding of unicode in > >some way, which could possibly allow even to specify even some > >non-unicode-charsets. > > While I'll skip diving deeper into the swamp that is character sets > and encoding (I'm already up to my neck in it, thanks, and I don't > have any long straws handy :) I'll point out that the above statement > is meaningless--there *are* no Unicode non-unicode charsets. > > It is possible to use the UTF encodings on non-unicode charsets--you > could reasonably use UTF-8 to encode, say, Shift-JIS characters. > (where Shift-JIS is both an encoding and a character set, and it can > be separated into pieces) > > It's not unwise (and, in practice, at least in implementation quite > sensible) to separate the encoding from the character set, but you > need to be careful to keep the separation clear, though many of the > sets and encodings don't go out of their way to help with that. > -- > Dan > > --------------------------------------"it's like this"------------------- > Dan Sugalski even samurai > [EMAIL PROTECTED] have teddy bears and even > teddy bears get drunk