His points, and the points in the serious article he links to, have merit.

It is easy to write UCS-2 or UTF-16 or whatever it is called code on the
erroneous theory that every character is 16 bits. Hard to make the
equivalent assumption with UTF-8. All you are doing with UCS-2 is making it
harder to test the outlier conditions.

The article also points out that "what is a character?" is not a simple
question, so it is impossible to say that every character is so many bits,
even in UTF-32. UTF-32 considers "ch" to be two characters, but to Czech
speakers it apparently is only one. Ditto for combining characters. é (hope
that makes it through the listserver) may I believe be legitimately encoded
as two "computer" characters, but everyone considers it culturally to be a
single character.

Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of John McKown
Sent: Friday, October 04, 2013 5:25 AM
To: [email protected]
Subject: OT? A cause to join, but somewhat humorous

http://www.theregister.co.uk/2013/10/04/verity_stob_unicode/

"Down with Unicode!" <grin/>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to