On May 26, 2011, at 22:56, Andrew Thompson wrote: > I believe this stems from a period in history when the unicode group believed > that they'd be able to fit all practical scripts into 65536 code points. > Which meant you could get away with all kinds of assumptions like 16 bit > types and UCS-2. > > As it became clear that wasn't going to be enough code points the additional > planes were defined and ucs2 fell out of favor being replaced by UTF16 which > can model the higher planes.
That would explain the parting of the ways between "code unit" and "code point", but not really the distinction between "code point" and "[Unicode] character". My memory of the days when Unicode first started to get a foothold (the early 90s IIRC) is very hazy, but I think there were actually two things going on: -- The belief, exactly as you describe, that 65536 was enough. -- A vagueness (or perhaps a deliberate lack of definition) about what should be called a "character". This seems to have been resolved now, and we have this hierarchy, at least in Unicode/Apple terms: code unit -> code point -> character -> grapheme -> (whatever the grouping is called upon which transformations like upper and lower case are performed) It's not ultimately so hard, just a bit perilous for the unwary. That's the reason I've been going on about this ad nauseam. If we shine some light on it, we may help demystify it. _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com