You are mistaken. The rules for encoding a longer UTF-8 character are
well-defined. http://en.wikipedia.org/wiki/UTF-8#Description 

Yes, it is a fact that for files with mostly Asian and similar characters
UTF-8 is longer than UTF-16.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On
Behalf Of John Gilmore
Sent: Friday, January 10, 2014 10:28 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: Subject Unicode

Paul,

No, I do not accept the premises you set out.

I will try, when I have more time, to make clear why with examples.

Briefly, effective rules for encoding any 'character' recognized as a
Unicode one as a 'longer' UTF-8 one do not in general exist.
Moreover, even when they are available, my experience with them has been
bad.  In dealing recently with a document containing mixed English, German,
Korean and Japanese text I found that the UTF-8 version was 23% longer than
the UTF-16 version.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to