Hi,

FYI: the unmunch algorithm for any one word and affix file is quite fast so that instead of pre-expanding the root/word list you could in fact simply take pieces of code from myspell that takes a word and finds a root with affix flags and then expand it for all affixes on the fly so to speak (at least for English).

Effectively, simply spellcheck each word in the search query (which can be done on the fly while typing (just like in OOo) which will identifies the entry in the hash table formed from the .dic file and then expand it on the fly using .aff info stored in memory to create the fuzzy word list for each word if you wanted.

Another nice feature of using a spellchecker with affix compression in that way is that you would catch typos and could offer suggestions to replace mistyped words very very easily.

In fact, you could just incorporate myspell as a library (it is BSD licensed) (or any other spellchecker with a compatible license) into your search code and get all of these features.

My 2 cents,

Kevin



On Apr 12, 2007, at 5:02 AM, Oleg Burlaca wrote:

Kevin B. Hendricks wrote:
Please remember than unmunch does not guarantee a one-to-one mapping between words and root forms. For example, an unmunched word may be generated by many different root words and affixes and not just once.

That is why the unmunched list of words is typically uniquely sorted to remove duplicates.
It's ok that the same word will be generated several times. I wanted to generate a list:
root1, word11
root1,word12
root1,word13
root2,word21
...
and to feed this list to the mnoGoSearch search engine in order to enable fuzzy search. i.e. when searching for word12, the search engine will also find docs with word11, word13.

Kevin, thanks for the comments.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev- [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to