On Fri, Apr 05, 2002 at 10:35:29AM -0500, Jungshik Shin wrote: > On Fri, 5 Apr 2002, Jarkko Hietaniemi wrote: > > > > P.S. Does utf8 support surrogates? Surrogate pair is definitely the > > > > No. Surrogates are solely for UTF-16. There's no need for surrogates > > in UTF-8 -- if we wanted to encode U+D800 using UTF-8, we *could* -- > > BUT we should not. Encoding U+D800 as UTF-8 should not be attempted, > > the whole surrogate space is a discontinuity in the Unicode code point > > space reserved for the evils of UTF-16. > > I can't agree more with you on this. Unfortunately, people > at Oracle and PeopleSoft think differently. Actually, what happened was > that they made a serious design mistake by making their DBs understand > only UTF-8 up to 3byte long although when they added UTF-8 support, > it was plainly clear that ISO 10646/Unicode was not just for BMP. > When planes beyond BMP finally began to be filled with actual characters, > they came up with that stupid idea of using two 3-byte-long UTF-8 units > (for surrogate pairs) to represent those characters.
Yeah. I saw that sorry mess. It looked a lot like a bunch of engineers unable to to admit that they made a mistake, and a bunc of managers unable to admit that they shipped broken products to their customers. > A lot of people on Unicode mailing list voiced a very strong > and technically solid objection against this, but Oracle and PeopleSoft > went on to publish DUTR #26: Compatibility Encoding Scheme for UTF-16 > (CESU-8) (http://www.unicode.org/unicode/reports/tr26). Does Encode > need to support this monster? I hope not. Definitely not. If Oracle/PeopleSoft want to support their own made-up encoding with Perl, they are welcome to write Encode::CESU8... > Jungshik Shin -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen