Re: string encoding

Simon Cozens Fri, 16 Feb 2001 03:34:06 -0800

On Thu, Feb 15, 2001 at 04:55:00PM -0800, Hong Zhang wrote:
> > On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote:
> > > The concept of characters have nothing to do with codepoints.
> > > Many characters are composed by more than one codepoints.
> > 
> > This isn't true.
> 
> What do you mean? Have you seen people using multi-byte encoding
> in Japan/China/Korea?

You're talking to the wrong person. Japanese data handling is my graduate
dissertation. :)

The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode (so-called "Unihan")
occupy one and only one codepoint each. Legacy data sets (EUC and the like)
can be processed internally by being converted to Unicode on entry to the
core.

Simon

-- 
"Why waste negative entropy on comments, when you could use the same
entropy to create bugs instead?"
-- Steve Elias

Re: string encoding

Reply via email to