Hi, Hunspell can use the first 65k characters of Unicode standard in affix rules. Moreover, Hunspell has a COMPLEXPREFIXES feature to handle agglutinative right-to-left languages by twofold prefix (here: left affix) splitting. See tests/complexprefixes.* and tests/complexprefixesutf.* examples in the Hunspell distribution ( http://hunspell.sourceforge.net).
About the Unicode problem. Some UTF-8 editors, for instance, Notepad on Windows use an invisible (in the editor) byte order mark sequence (BOM). Unfortunatelly, Hunspell doesn't remove this 3-byte sequence in the beginning of the affix file. The solution is removing the BOM sequence in a byte editor, or simply leave the first line empty in the affix file. Regards, Laci On 1/5/07, Erdal Ronahi <[EMAIL PROTECTED]> wrote:
Hi, after the recent discussion about the Farsi (Persian) spellchecker on the native-lang list I decided to try out some things. Basically I want to create a hunspell dictionary for Kurdish with Arabic script, which is in a lot of ways similar to Farsi. But I ran into some problems. I could not get any affix working. I tried a lot of different things, nothing worked. I am familiar with the concept of affixes in MySpell/Hunspell, so my question is: Has ANYbody EVER got affixes working in a right-to-left language with Hunspell in UTF-8? Or has anybody an idea whether it should work or not? I have not found any example. Farsi has just a plain wordlist, Hebrew is not UTF-8. I haven't seen an Arabic Hunspell dictionary anywhere around. And my own Kurdish/Sorani affix file stubbornly refuses to work. Regards, Erdal --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
