Re: [Encode] UCS/UTF mess and Surrogate Handlings

Jarkko Hietaniemi Fri, 05 Apr 2002 07:21:04 -0800

On Fri, Apr 05, 2002 at 10:35:29AM -0500, Jungshik Shin wrote:
> On Fri, 5 Apr 2002, Jarkko Hietaniemi wrote:
> 
> > > P.S.  Does utf8 support surrogates?  Surrogate pair is definitely the 
> > 
> > No.  Surrogates are solely for UTF-16.  There's no need for surrogates
> > in UTF-8 -- if we wanted to encode U+D800 using UTF-8, we *could* --
> > BUT we should not.  Encoding U+D800 as UTF-8 should not be attempted,
> > the whole surrogate space is a discontinuity in the Unicode code point
> > space reserved for the evils of UTF-16.
> 
>   I can't agree more with you on this. Unfortunately, people
> at Oracle and PeopleSoft think differently. Actually, what happened was
> that they made a serious design mistake by making their DBs understand
> only UTF-8 up to 3byte long although when they added UTF-8 support,
> it was plainly clear that ISO 10646/Unicode was not just for BMP.
> When planes beyond BMP finally began to be filled with actual characters,
> they came up with that stupid idea of using two 3-byte-long UTF-8 units
> (for surrogate pairs) to represent those characters.


Yeah.  I saw that sorry mess.  It looked a lot like a bunch of
engineers unable to to admit that they made a mistake, and a bunc of
managers unable to admit that they shipped broken products to
their customers.

>   A lot of people on Unicode mailing list voiced a very strong
> and technically solid objection against this, but Oracle and PeopleSoft
> went on to publish DUTR  #26: Compatibility Encoding Scheme for UTF-16
> (CESU-8) (http://www.unicode.org/unicode/reports/tr26). Does Encode
> need to support this monster?  I hope not.

Definitely not.  If Oracle/PeopleSoft want to support their own
made-up encoding with Perl, they are welcome to write Encode::CESU8...

>    Jungshik Shin

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Re: [Encode] UCS/UTF mess and Surrogate Handlings

Reply via email to