On 8/29/05, Chris Lu <[EMAIL PROTECTED]> wrote: > > > Two approaches I can think of: > * Use a word list(it may not be the word list you want, but it is just > a compromise). > * Analyze your original index, listing out all words inside. > > Using a word list suffers from two problems: 1. (Coverage) No word list is complete, they need to be maintained as new words are coined, and word lists for different languages vary in quality. 2. (Useless suggestions) There is little point in making a suggestion for words that aren't in the original index (as they wouldn't produce any hits).
For these reasons it is better to use the original index as a source of words. It is true that the index will likely contain spelling errors, however Lucene Spell Checker provides a way to restrict suggestions to words that are more popular than the query term. As misspellings are typically rarer than correct spellings this should ensure that misspelled suggestions are almost never made. The article quoted above (which I wrote), provides a bit more discussion. Tom