On 01/15/2011 05:59 PM, Steven Schveighoffer wrote:
I think this is a good alternative, but I'd rather not impose this on
people like myself who deal mostly with English.  I think this should be
possible to do with wrapper types or intermediate ranges which have
graphemes as elements (per my suggestion above).

I am unsure now about the question of a text's (apparent) natural language in relation to unicode issues. For instance English, precisely, seems to often include foreign words literally (or is it a kind of pedantism from highly educated people?). In fact, users are free to include whatever characters they like, as soon as they text-composition interface allows it. All main OSes, I guess, now have at least one standard way to type in characters (or codepoint) that are not directly accessible on keyboards, and application sometimes offer another. Some kinds of users love to play with such flexibility. So, maybe, the right question is not the one of natural language but of text-composition means. I guess that as soon as a human user may have freely typed or edited a text, we cannot guarantee much upon its actual content, what do you think? The case of historic ASCII-only text is relevant, indeed, but will fast become less. And how does an application writer recognises them without iterating the whole content? (The encoding is utf8 compatible.)

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to