On Wed, 24 Jul 2002, Olivier Amira wrote: > I would like to implement in my Lucene application a google's like > feature like the "Did you mean" google's feature. So, when the user > enters a wrong spelling of a word, the search engine automatically > propose a similar better word. To implement such function in a Lucene > application, I'm not sure of what method is the best (or it's correct to > try to di this with a Lucene index). Is there anybody that could help-me > for this?
There are a couple of different approaches to this that I'm aware of. (1) Find a list of commonly misspelled words, detect them in a query, and prompt the user with the corresponding correctly spelled words. Such lists are pretty common. Advantages: reasonably easy to implement, computationally cheap, and most of the work (figuring out what words to flag and what words to suggest in their place) is done statically. Disadvantages: it will catch 'speling' mistakes but not 'spellling' mistakes (that is, it will only recognize errors that you tell it about). This is entirely independent of the index unless you go to the trouble of removing entries from this auxiliary data structure that correspond to words that aren't in the index anyway. (2) There's something in the Lucene API docs about a FuzzyQuery that mentions Levenshtein distance (= string edit distance, I believe). I haven't looked into this myself, but I would guess that you should be able to construct a FuzzyQuery that specifies a maximum string edit distance between a specified search term and other terms in the index. Unfortunately, the API docs are just about that helpful; FuzzyTermEnum has more information but doesn't tell you how to use FuzzyQuery. On the other hand at least you now know where to look in the source code. :) Advantages: more flexible, seems like it's built in; disadvantages: docs not helpful, will probably slow your query down more than (1) would. You could also try to write your own string edit distance calculator/data structure, but I don't have any quick answers as to how to do that. Good luck-- Joshua O'Madadhain [EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall It's that moment of dawning comprehension that I live for--Bill Watterson My opinions are too rational and insightful to be those of any organization. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>