Hi,
FYI: the unmunch algorithm for any one word and affix file is quite
fast so that instead of pre-expanding the root/word list you could in
fact simply take pieces of code from myspell that takes a word and
finds a root with affix flags and then expand it for all affixes on
the fly so to speak (at least for English).
Effectively, simply spellcheck each word in the search query (which
can be done on the fly while typing (just like in OOo) which will
identifies the entry in the hash table formed from the .dic file and
then expand it on the fly using .aff info stored in memory to create
the fuzzy word list for each word if you wanted.
Another nice feature of using a spellchecker with affix compression
in that way is that you would catch typos and could offer suggestions
to replace mistyped words very very easily.
In fact, you could just incorporate myspell as a library (it is BSD
licensed) (or any other spellchecker with a compatible license) into
your search code and get all of these features.
My 2 cents,
Kevin
On Apr 12, 2007, at 5:02 AM, Oleg Burlaca wrote:
Kevin B. Hendricks wrote:
Please remember than unmunch does not guarantee a one-to-one
mapping between words and root forms. For example, an unmunched
word may be generated by many different root words and affixes and
not just once.
That is why the unmunched list of words is typically uniquely
sorted to remove duplicates.
It's ok that the same word will be generated several times. I
wanted to generate a list:
root1, word11
root1,word12
root1,word13
root2,word21
...
and to feed this list to the mnoGoSearch search engine in order to
enable fuzzy search.
i.e. when searching for word12, the search engine will also find
docs with word11, word13.
Kevin, thanks for the comments.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: dev-
[EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]