Re: [native-lang] Hunspell and Affixes in Right-to-left Languages

Németh László Fri, 05 Jan 2007 00:08:08 -0800

Hi,

Hunspell can use the first 65k characters of Unicode standard in affix
rules.
Moreover, Hunspell has a COMPLEXPREFIXES feature to handle agglutinative
right-to-left languages by twofold prefix (here: left affix) splitting. See
tests/complexprefixes.*
and tests/complexprefixesutf.* examples in the Hunspell distribution (
http://hunspell.sourceforge.net).


About the Unicode problem. Some UTF-8 editors, for instance, Notepad on
Windows use an invisible (in the editor) byte order mark sequence (BOM).
Unfortunatelly,
Hunspell doesn't remove this 3-byte sequence in the beginning of the
affix file. The solution is removing the BOM sequence in a byte
editor, or simply leave the first line empty in the affix file.

Regards,

Laci

On 1/5/07, Erdal Ronahi <[EMAIL PROTECTED]> wrote:


Hi,

after the recent discussion about the Farsi (Persian) spellchecker on
the native-lang list I decided to try out some things. Basically I
want to create a hunspell dictionary for Kurdish with Arabic script,
which is in a lot of ways similar to Farsi. But I ran into some
problems.

I could not get any affix working. I tried a lot of different things,
nothing worked. I am familiar with the concept of affixes in
MySpell/Hunspell, so my question is:

Has ANYbody EVER got affixes working in a right-to-left language with
Hunspell in UTF-8? Or has anybody an idea whether it should work or
not?

I have not found any example. Farsi has just a plain wordlist, Hebrew
is not UTF-8. I haven't seen an Arabic Hunspell dictionary anywhere
around. And my own Kurdish/Sorani affix file stubbornly refuses to
work.

Regards,
Erdal

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [native-lang] Hunspell and Affixes in Right-to-left Languages

Reply via email to