Fair enough. I was answering a question about "French Unicode" at five o'clock. I certainly don't mean to get hung up on "efficiency" and yes, for certain character distributions, UTF-16 yields a shorter file or message length than UTF-8.
Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf Of Timothy Sipples Sent: Thursday, January 09, 2014 11:31 PM To: IBM-MAIN@LISTSERV.UA.EDU Subject: Re: Subject Unicode Charles Mills writes: >You could use 16 bits for every character, with some sort of cleverness >that yielded two 16-bit words when you had a code point bigger than >65535 (actually somewhat less due to how the cleverness works). That is >called UTF-16. Pretty good but still not very efficient. In Japan and China, to pick a couple examples, UTF-16 is rather efficient. There are also far worse inefficiencies than using 16 bits to store each Latin character. In short, I wouldn't get *too* hung up on this point, especially as the complete lifecycle costs of storage continue to fall. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN