Charles Mills writes:
>You could use 16 bits for every character, with some sort of
>cleverness that yielded two 16-bit words when you had a code
>point bigger than 65535 (actually somewhat less due to how the
>cleverness works). That is called UTF-16. Pretty good but
>still not very efficient.

In Japan and China, to pick a couple examples, UTF-16 is rather efficient.
There are also far worse inefficiencies than using 16 bits to store each
Latin character. In short, I wouldn't get *too* hung up on this point,
especially as the complete lifecycle costs of storage continue to fall.

For example, if you're designing applications and information systems for a
global audience (or potentially global audience), it could be a perfectly
reasonable decision to standardize on UTF-16 in favor of potential
reductions in testing (for example). I think this is exactly what SAP did
around the time they introduced their ECC releases, for instance.

Somehow I'm reminded of the "save two characters" impulse which then caused
a lot of angst in preparing for Y2K. :-) If there's a reasonable argument
for spending 16 bits -- and sometimes there is -- by all means, spend them.
This isn't 1974 or even 1994. The vast majority of the world's data are not
codepoint-encoded alphanumerics anyway.

--------------------------------------------------------------------------------------------------------
Timothy Sipples
GMU VCT Architect Executive (Based in Singapore)
E-Mail: sipp...@sg.ibm.com
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to