Re: Subject Unicode

Charles Mills Fri, 10 Jan 2014 07:11:38 -0800

Fair enough. I was answering a question about "French Unicode" at five
o'clock. I certainly don't mean to get hung up on "efficiency" and yes, for
certain character distributions, UTF-16 yields a shorter file or message
length than UTF-8.


Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of Timothy Sipples
Sent: Thursday, January 09, 2014 11:31 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Subject Unicode

Charles Mills writes:
>You could use 16 bits for every character, with some sort of cleverness 
>that yielded two 16-bit words when you had a code point bigger than 
>65535 (actually somewhat less due to how the cleverness works). That is 
>called UTF-16. Pretty good but still not very efficient.

In Japan and China, to pick a couple examples, UTF-16 is rather efficient.
There are also far worse inefficiencies than using 16 bits to store each
Latin character. In short, I wouldn't get *too* hung up on this point,
especially as the complete lifecycle costs of storage continue to fall.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Subject Unicode

Reply via email to