Re: VLERange: a range in between BidirectionalRange and

spir Mon, 17 Jan 2011 07:15:28 -0800

On 01/15/2011 08:51 PM, Steven Schveighoffer wrote:

More over, Even if you ignore Hebrew as a tiny insignificant minority
you cannot do the same for Arabic which has over one *billion* people
that use that language.


I hope that the medium type works 'good enough' for those languages,
with the high level type needed for advanced usages.  At a minimum,
comparison and substring should work for all languages.


Hello Steven,

How does an application know that a given text, which supposedly iswritten in a given natural language (as for instance indicated by anhtml header) does not also hold terms from other languages? There arevarious occasions for this: quotations, use of foreign words, pointers...

A side-issue is raised by precomposed codes for composite characters.For most languages of the world, I guess (but unsure), all "official"characters have single-code representations. Good, but unfortunatelythis is not enforced by the standard (instead, the decomposed form cansensibly be considered the base form, but this is another topic).So that even if ones knows for sure that all characters of all texts anapp will ever deal with can be mapped to single codes, to be safe onewould have to normalise to NFC anyway (Normalised Form Composed). Then,where is the actual gain? In fact, it is a loss because NFC is morecostly than NFD (Decomposed) --actually, the standard NFC algo firstdecomposes to NFD to initially get an unique representation that canthen be more easily (re)composed via simple mappings.


For further information:
Unicode's normalisation algos: http://unicode.org/reports/tr15/
list of technical reports: http://unicode.org/reports/

(Unicode's technical reports are far more readible than the standarditself, but unfortunately often refer to it.)


Denis
_________________
vita es estrany
spir.wikidot.com

Re: VLERange: a range in between BidirectionalRange and

Reply via email to