I tried it by changing the first prefixPath initialization to List<FSTUtil.Path<Pair<Long,BytesRef>>> prefixPaths = FSTUtil.intersectPrefixPaths(convertAutomaton(lookupAutomaton), fst); prefixPaths = getFullPrefixPaths(prefixPaths, lookupAutomaton, fst);
inside AnalyzingSuggester.lookup(..). (simply copied the line from below) Sadly, FuzzySuggester now gives no hits at all, even with a correct spelled query. Correct spelled query: prefixPaths size == 1 returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader) (without getFullPrefixPath: non-null) Query within edit distance - the same: prefixPaths size == 1 (without getFullPrefixPath: 0) returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader) Query outside of edit distance: prefixPaths size = 0 Seems like the fuzziness is there, but getFullPrefixPaths kicks all END_BYTEs ? On 14.11.2013 17:05, Michael McCandless wrote: > On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling > <christian.reuschl...@gmail.com> wrote: >> We started to implement a named entity recognition on the base of >> AnalyzingSuggester, which >> offers the great support for Synonyms, Stopwords, etc. For this, we slightly >> modified >> AnalyzingSuggester.lookup() to only return the exactFirst hits (considering >> the exactFirst >> code block only, skipping the 'sameSurfaceForm' check and break, to get the >> synonym hits >> too). >> >> This works pretty good, and our next step would be to bring in some >> fuzzyness against >> spelling mistakes. For this, the idea was to do exactly the same, but with >> FuzzySuggester >> instead. >> >> Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only >> relies on sharing the >> same prefix - also different/misspelled terms inside the edit distance are >> considered as 'not >> exact', which means we get the same results as with AnalyzingSuggester. >> >> >> query: "screen" misspelled query: "screan" dictionary: "screen", >> "screensaver" >> >> AnalyzingSuggester hits: screen, screensaver AnalyzingSuggester hits on >> misspelled query: >> <empty> AnalyzingSuggester EXACT_FIRST hits: screen AnalyzingSuggester >> EXACT_FIRST hits on >> misspelled query: <empty> >> >> FuzzySuggester hits: screen, screensaver FuzzySuggester hits on misspelled >> query: screen, >> screensaver FuzzySuggester EXACT_FIRST hits: screen FuzzySuggester >> EXACT_FIRST hits on >> misspelled query: <empty> => TARGET: screen >> >> >> Is there a possibility to distinguish? I see that the 'exact' criteria >> relies on an FST >> aspect 'END_BYTE arc leaving'. Maybe these can be set differently when >> building the >> Levenshtein automata? I have no clue. > > It seems like the problem is that AnalyzingSuggester checks for exactFirst > before calling > .getFullPrefixPaths (which, in FuzzySuggester subclass, applies the > fuzziness)? > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- To > unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: > java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org