On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling <christian.reuschl...@gmail.com> wrote: > We started to implement a named entity recognition on the base of > AnalyzingSuggester, which offers > the great support for Synonyms, Stopwords, etc. > For this, we slightly modified AnalyzingSuggester.lookup() to only return the > exactFirst hits > (considering the exactFirst code block only, skipping the 'sameSurfaceForm' > check and break, to get > the synonym hits too). > > This works pretty good, and our next step would be to bring in some fuzzyness > against spelling > mistakes. For this, the idea was to do exactly the same, but with > FuzzySuggester instead. > > Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only relies > on sharing the same > prefix - also different/misspelled terms inside the edit distance are > considered as 'not exact', > which means we get the same results as with AnalyzingSuggester. > > > query: "screen" > misspelled query: "screan" > dictionary: "screen", "screensaver" > > AnalyzingSuggester hits: screen, screensaver > AnalyzingSuggester hits on misspelled query: <empty> > AnalyzingSuggester EXACT_FIRST hits: screen > AnalyzingSuggester EXACT_FIRST hits on misspelled query: <empty> > > FuzzySuggester hits: screen, screensaver > FuzzySuggester hits on misspelled query: screen, screensaver > FuzzySuggester EXACT_FIRST hits: screen > FuzzySuggester EXACT_FIRST hits on misspelled query: <empty> => TARGET: screen > > > Is there a possibility to distinguish? I see that the 'exact' criteria relies > on an FST aspect > 'END_BYTE arc leaving'. Maybe these can be set differently when building the > Levenshtein automata? I > have no clue.
It seems like the problem is that AnalyzingSuggester checks for exactFirst before calling .getFullPrefixPaths (which, in FuzzySuggester subclass, applies the fuzziness)? Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org