Hi,

On Mar 7, 2005, at 4:18 PM, Luke Myers wrote:

Thanks for clearing up spell checking for me. I had an idea of how it worked but no real specifics. This along with the link to the article on affix file format (http://lingucomponent.openoffice.org/affix.readme) were possibly the most important posts recently. I'd really like another article/tutorial on the subject and I know they exist, but can't find them. Is there a site index for the lingucomponent page?


No, unfortunately there is no site index. The lingucomponent site is not that big. The best way to play around and figure this out is to grab the standalone MySpell from a link on that site and unzip it and look at the README's and things.



There are other interesting things that complicate the life of the spellchecker:


1. how to generate acceptable suggestions
   (look at the suggestmgr.cxx code for lots of ideas including)
     - single character insertions, deletions, etc,
     - multi character replacement tables for common failures
     - related characters sets that are often swapped
     - ngram scoring for words that may have many multi-character errors

     Aspell actually uses phonetic suggestions!


2. how to handle compound words (there are so many rules for compounding!)
MySpell does NOT do this well but Hunspell might be something to look at



3. how to handle different character encodings

4. how to spell check numbers, currency, etc (MySpell does not do this well either)


I would be happy to provide an overview on any of these ideas or answer questions about the code for anyone who is interested enough to try and look at it.


Hope this helps,

Kevin



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to