Forwarding to this mailing list. As Gora said it is strange to make users knows about ZWJ and ZWNJ, but there is no other way to solve the specific problem of chillaksharas. We are trying to minimize this problem by using transliteration based keyboards. Inscript Keyboard layout users must know about these codepoints as of now. There is a discussion going on this in unicode mailing list , whether we need to assign unicode points for chillaksharams or not.
Anyway, can somebody suggest a solution for the aspell-malayalam problem mentioned in the below mail? -santhosh -------- From: Gora Mohanty via RT <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, Jul 2, 2007 at 2:29 PM Dear Santosh, I remember our discussing this, and I am sorry that I got awfully busy on other things. I will take a look at the code for the Malayalam dictionary that you sent me. If you remember, I had suggested that we filter out ZWJ/ZWNJ from both the words in the dictionary, and from the input words to be spell-checked. I find it a little strange that you are expecting end-users to be aware of what ZWJ/ZWNJ are, and how to enter them correctly. However, it is probably my understanding of Malayalam chillaksharas that is at fault here, as I am given to understand that all open-source renderers will now have to start taking this into account. There are two problems here, the first being that ZWJ/ZWNJ have to be assigned to currently-vacant Unicode Malayalam codepoints. You are right that this will cause problems later on, should those codepoints get assigned, but changing the internal aspell encoding is not too difficult a task. The second problem is that it seems to be that assigning ZWJ/ZWNJ in this manner does not seem to work in aspell, probably because it is already aware of the existence of these. Kevin, can you shed more light on this? This discussion should also probably be taken to the aspell-devel mailing list. Regards, Gora ------------------------ From: Santhosh Thottingal <[EMAIL PROTECTED]> To: Kevin Atkinson <[EMAIL PROTECTED]> Date: Mon, Jul 2, 2007 at 12:48 PM Hi, I am working on the Aspell Malayalam wordlist preparation. I am facing a problem related to ZWJ and ZWNJ. In Malayalam language, the usage of ZWJ and ZWNJ is very common for a particular set of alphabets named "chillus". The u-mlym.txt contains the following entry. 0x80..0xFF = U+0D00..U+0D7F When we try to prepare the wordlist using this, all the words with zwj and zwnj are rejected. Some of my friends suggested to use unused unicode points in the Malayalam for U+200C and U+200D like this. 0x80..0x81=U+200C..U+200D 0x82..0xFF= U+0D02..U+0D7F But when i asked about this in the unicode mailing list, they discouraged to use this approach, since these unused points might be used in future and at that point our application will break. Could you please give a solution for this? Thanks, Santhosh Thottingal Swathanthra Malayalam Computing. _______________________________________________ Aspell-devel mailing list [email protected] http://lists.gnu.org/mailman/listinfo/aspell-devel
