On 10 January 2014 13:28, John Gilmore <jwgli...@gmail.com> wrote:
> Briefly, effective rules for encoding any 'character' recognized as a
> Unicode one as a 'longer' UTF-8 one do not in general exist.

I am most puzzled to read this. UTF-8 is what Unicode calls a
"transform format", and the conversion from other encodings of Unicode
characters is strictly (and simply) algorithmic, and by extension,
unambiguous. (In the early Unicode discussions in the 1990s, some
people whose native language was not English objected to the ambiguity
and even intranslatability of the English phrase "transform format",
but despite that, the algorithmicity remains and is definitive.)

> Moreover, even when they are available, my experience with them has
> been bad.  In dealing recently with a document containing mixed
> English, German, Korean and Japanese text I found that the UTF-8
> version was 23% longer than the UTF-16 version.

That I don't doubt at all. Whether UTF-8 is a good format for storage,
transmission, or manipulation of Unicode characters surely varies by
context.

Tony H.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to