Re: FuzzySuggester EXACT_FIRST criteria

Christian Reuschling Thu, 14 Nov 2013 11:45:06 -0800

I tried it by changing the first prefixPath initialization to

List<FSTUtil.Path<Pair<Long,BytesRef>>> prefixPaths =
    FSTUtil.intersectPrefixPaths(convertAutomaton(lookupAutomaton), fst);
prefixPaths = getFullPrefixPaths(prefixPaths, lookupAutomaton, fst);


inside AnalyzingSuggester.lookup(..). (simply copied the line from below)

Sadly, FuzzySuggester now gives no hits at all, even with a correct spelled 
query.

Correct spelled query:
prefixPaths size == 1
returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader)
  (without getFullPrefixPath: non-null)

Query within edit distance - the same:
prefixPaths size == 1   (without getFullPrefixPath: 0)
returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader)

Query outside of edit distance:
prefixPaths size = 0

Seems like the fuzziness is there, but getFullPrefixPaths kicks all END_BYTEs ?



On 14.11.2013 17:05, Michael McCandless wrote:
> On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling 
> <christian.reuschl...@gmail.com> wrote:
>> We started to implement a named entity recognition on the base of 
>> AnalyzingSuggester, which
>> offers the great support for Synonyms, Stopwords, etc. For this, we slightly 
>> modified
>> AnalyzingSuggester.lookup() to only return the exactFirst hits (considering 
>> the exactFirst
>> code block only, skipping the 'sameSurfaceForm' check and break, to get the 
>> synonym hits
>> too).
>> 
>> This works pretty good, and our next step would be to bring in some 
>> fuzzyness against
>> spelling mistakes. For this, the idea was to do exactly the same, but with 
>> FuzzySuggester
>> instead.
>> 
>> Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only 
>> relies on sharing the
>> same prefix - also different/misspelled terms inside the edit distance are 
>> considered as 'not
>> exact', which means we get the same results as with AnalyzingSuggester.
>> 
>> 
>> query: "screen" misspelled query: "screan" dictionary: "screen", 
>> "screensaver"
>> 
>> AnalyzingSuggester hits: screen, screensaver AnalyzingSuggester hits on 
>> misspelled query:
>> <empty> AnalyzingSuggester EXACT_FIRST hits: screen AnalyzingSuggester 
>> EXACT_FIRST hits on
>> misspelled query: <empty>
>> 
>> FuzzySuggester hits: screen, screensaver FuzzySuggester hits on misspelled 
>> query: screen,
>> screensaver FuzzySuggester EXACT_FIRST hits: screen FuzzySuggester 
>> EXACT_FIRST hits on
>> misspelled query: <empty> => TARGET: screen
>> 
>> 
>> Is there a possibility to distinguish? I see that the 'exact' criteria 
>> relies on an FST
>> aspect 'END_BYTE arc leaving'. Maybe these can be set differently when 
>> building the
>> Levenshtein automata? I have no clue.
> 
> It seems like the problem is that AnalyzingSuggester checks for exactFirst 
> before calling
> .getFullPrefixPaths (which, in FuzzySuggester subclass, applies the 
> fuzziness)?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail:
> java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:
> java-user-h...@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: FuzzySuggester EXACT_FIRST criteria

Reply via email to