srintuar wrote: > FWIW, I'd assert that "j" in Spanish is not the same thing as > "j" in English (and that one is easily proved), apart from them being > represented with the same *glyph*.
You picked (certainly involuntarily) a very instructive example. I am living in Spain, so I feel qualified to issue an advice upon this one. While my uses (note the plural) of "j" in "Spanish" is different from my use (note the singular) in English, there is much more difference between my use of "jota" in Castilian (a form of "Spanish" where "j" is pronounced as a laryngal, similar to "Ñ/h" for Danilo; sorry I do not know Vietnamese) and my use of "jota" (the letter does not change its name) in Catalan (another form of "Spanish" where "j" is pronounced more or less like in French, similar to "Ð/Å" for Danilo); and in Valencian (a variant of Catalan, so another form of "Spanish", spoken where I am living) it is pronounced as affricate, that is... as in English. Now, the very interesting thing is that people here, when they ignore the context language, use... their local prononciation; so the *same* jota is pronounced differently by different Spanish persons. As a practical example, the name of the letter itself, jota, is pronounced /xota/ (/ÑÐÑÐ/) in Castilla, /ÊÉtÉ/ (/ÐÐÑÐ/) in Barcelona and /dÍÊÉta/ (/ÐÐÐÑÐ/) here in Valencia. And of course, NOBODY is willingful to have three different Js on her keyboard (plus another to write German, as a bonus.) > Certainly the character is used differently. However, I would assert > that it is indeed the same character. Both English and Spanish > use latin script. The Unicode analysis here is that there are the same, since there is a continuum of uses that embrace both languages (in other words, you will not encounter systemic differences inside a given language, even if you can encounter systemic differences *between* languages). On the other hand, they decided that there are systemic differences between A and Ð (Latin/Cyrillic). Also, in the case of "j", fact is that one can trace it down in the evolution of the script(s), and all forms of "j" do have a common ancester (no earlier than XVIth century). > Also, imagine the chaos for OCR programs: you'd have to tell them > ahead of time which language they are supposed to read in. This is an aside, but already you have to tell them: the software will use that information to select a dictionnary over another, and this enhances the result by a very important margin. For example, until you are telling the OCR software you are reading Vietnamese, it will discard any traces it "sees" below the vowels as being meaningless. Antoine -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/