On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling
<christian.reuschl...@gmail.com> wrote:
> We started to implement a named entity recognition on the base of 
> AnalyzingSuggester, which offers
> the great support for Synonyms, Stopwords, etc.
> For this, we slightly modified AnalyzingSuggester.lookup() to only return the 
> exactFirst hits
> (considering the exactFirst code block only, skipping the 'sameSurfaceForm' 
> check and break, to get
> the synonym hits too).
>
> This works pretty good, and our next step would be to bring in some fuzzyness 
> against spelling
> mistakes. For this, the idea was to do exactly the same, but with 
> FuzzySuggester instead.
>
> Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only relies 
> on sharing the same
> prefix - also different/misspelled terms inside the edit distance are 
> considered as 'not exact',
> which means we get the same results as with AnalyzingSuggester.
>
>
> query: "screen"
> misspelled query: "screan"
> dictionary: "screen", "screensaver"
>
> AnalyzingSuggester hits: screen, screensaver
> AnalyzingSuggester hits on misspelled query: <empty>
> AnalyzingSuggester EXACT_FIRST hits: screen
> AnalyzingSuggester EXACT_FIRST hits on misspelled query: <empty>
>
> FuzzySuggester hits: screen, screensaver
> FuzzySuggester hits on misspelled query: screen, screensaver
> FuzzySuggester EXACT_FIRST hits: screen
> FuzzySuggester EXACT_FIRST hits on misspelled query: <empty> => TARGET: screen
>
>
> Is there a possibility to distinguish? I see that the 'exact' criteria relies 
> on an FST aspect
> 'END_BYTE arc leaving'. Maybe these can be set differently when building the 
> Levenshtein automata? I
> have no clue.

It seems like the problem is that AnalyzingSuggester checks for
exactFirst before calling .getFullPrefixPaths (which, in
FuzzySuggester subclass, applies the fuzziness)?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to