On Mon, 13 Nov 2006, [EMAIL PROTECTED] wrote:

[Sorry if this messes up the Unicode characters, pine is lame and doesn't support utf-8]

The base characters themselves certainly fit. However, if one wishes to
operate on syllables (made by combining consonants in the base
character set), the number of these syllables can exceed 256.
 Here is a short example of just one of the issues that come up when
treating characters, rather than syllables as the base unit in Hindi.
Take, for example, the conjunct, "kra", क्र. This is represented
linguistically, and in UTF-8, as क + ् + र (U0915 + U094D + U0930).
It makes no sense to swap the "halant" (U094D) with the "ka" or the
"ra", as that creates a completely different conjunct, and is not a
mistake that would typically be made. As you suggest, I could just
include "kra" in the encoding, but, in many Indian languages, the
256 available slots are not sufficient for all such conjuncts.

I am going to need a better explanation.

So "kra" is stored in Unicode using three "characters"? But you want to store it using the "kra" conjunct? Which is not the way it is normally stored. What is the Unicode character for "kra"?


_______________________________________________
Aspell-devel mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/aspell-devel

Reply via email to