In <[email protected]>, on 10/04/2013
   at 10:31 AM, Charles Mills <[email protected]> said:

>It is easy to write UCS-2 or UTF-16 or whatever it is called

UCS-2 is called UCS-2 and UTF-16 is called UTF-16; they are not the
same.

>All you are doing with UCS-2 is making it harder to test the outlier 
>conditions.

No. Anything that is an outlier for UCS-2 is also an outlier for
UTF-8, and there is less complexity for UCS-2, since it only covers
the BMP and has a consistent size for each code point. The only real
issues with UCS-2 aare the BOM and the fact that it takes more space
for mostly ASCII text.

>The article also points out that "what is a character?" is not a simple 
>question,

It is in Unicode; a character is not the same as a glyph.

>Ditto for combining characters. é (hope that makes it through the listserver) 
>may I 
>believe be legitimately encoded as two "computer" characters, but 
>everyone considers it culturally to be a single character.

U+0065 U+00B4 is two characters, even though you would normally render
it with the same glyph as U+00E9. See "The Unicode 5.0 Standard".

-- 
     Shmuel (Seymour J.) Metz, SysProg and JOAT
     ISO position; see <http://patriot.net/~shmuel/resume/brief.html> 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to