Hi
Mathieu Lecarme wrote:
On a related topic, I'm also searching for a way to suggest
alternate spelling of words to the user, when we found a word
which is very less frequent used in the index or not in the index
at all. I'm Austrian based, when I e.g. search for
"retthich" (wrong spelled "rettich" which is radish), Google
suggests me the proper spelled word. I'm searching for a way to
figure how to accomplish this, but again this may be lucene off-
topic and/or I should properly start a separate thread ...
you can use the ngram pattern and levestein distance to find near
words.
I try with phonetic and aspell dictionnary.
Thanks for the dictionary hint. So obvious, but still I haven't
thought about it until you mentioned it!
I've had really great success with the following pattern:
* after the Lucene index was created, I generate a myspell
compatible dictionary from it
* when the search returns no result, every term is ran through
myspell suggestion
* the top five myspell suggestion of each term are re-sorted by
their frequency from the Lucene index
Additionally: when the user only entered a single term, I'm also
providing the second best results as alternative (did you mean x or
y?).
The results have been very good so far, thanks again!
- Markus
I submit a patch for doing that nicely :
https://issues.apache.org/jira/browse/LUCENE-1190
Here is an example code :
LexiconReader lexiconReader = new DirectoryReader(directory);
Lexicon lexicon = new Lexicon(new RAMDirectory());
lexicon.addAnalyser(new NGramAnalyzer());
lexicon.read(lexiconReader);
QueryParser parser = new QueryParser("txt", new WhitespaceAnalyzer());
Query query = parser.parse("bio:brawn");
SuggestiveSearcher searcher = new SuggestiveSearcher(new
IndexSearcher(directory), lexicon);
SuggestiveHits hits = searcher.searchWithSuggestions(query);
System.out.println(hits.getSuggestedQuery());
In this example, a lexicon is build from a directory, with a Ngram
analyzer.
The directory contains the term "brown" in field "bio".
Hits is final, so, you can't heritate from it, that's why there is a
SuggestiveHits.
SuggestiveHits has a thresold, if there's to few answer, similar word
search is triggered.
Using myspell dictionnary is a good idea, i'll implement this.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]