I tried it by changing the first prefixPath initialization to
List<FSTUtil.Path<Pair<Long,BytesRef>>> prefixPaths =
FSTUtil.intersectPrefixPaths(convertAutomaton(lookupAutomaton), fst);
prefixPaths = getFullPrefixPaths(prefixPaths, lookupAutomaton, fst);
inside AnalyzingSuggester.lookup(..). (simply copied the line from below)
Sadly, FuzzySuggester now gives no hits at all, even with a correct spelled
query.
Correct spelled query:
prefixPaths size == 1
returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader)
(without getFullPrefixPath: non-null)
Query within edit distance - the same:
prefixPaths size == 1 (without getFullPrefixPath: 0)
returns null: fst.findTargetArc(END_BYTE, path.fstNode, scratchArc, bytesReader)
Query outside of edit distance:
prefixPaths size = 0
Seems like the fuzziness is there, but getFullPrefixPaths kicks all END_BYTEs ?
On 14.11.2013 17:05, Michael McCandless wrote:
> On Wed, Nov 13, 2013 at 12:04 PM, Christian Reuschling
> <[email protected]> wrote:
>> We started to implement a named entity recognition on the base of
>> AnalyzingSuggester, which
>> offers the great support for Synonyms, Stopwords, etc. For this, we slightly
>> modified
>> AnalyzingSuggester.lookup() to only return the exactFirst hits (considering
>> the exactFirst
>> code block only, skipping the 'sameSurfaceForm' check and break, to get the
>> synonym hits
>> too).
>>
>> This works pretty good, and our next step would be to bring in some
>> fuzzyness against
>> spelling mistakes. For this, the idea was to do exactly the same, but with
>> FuzzySuggester
>> instead.
>>
>> Now we have the problem that 'EXCACT_FIRST' in FuzzySuggester not only
>> relies on sharing the
>> same prefix - also different/misspelled terms inside the edit distance are
>> considered as 'not
>> exact', which means we get the same results as with AnalyzingSuggester.
>>
>>
>> query: "screen" misspelled query: "screan" dictionary: "screen",
>> "screensaver"
>>
>> AnalyzingSuggester hits: screen, screensaver AnalyzingSuggester hits on
>> misspelled query:
>> <empty> AnalyzingSuggester EXACT_FIRST hits: screen AnalyzingSuggester
>> EXACT_FIRST hits on
>> misspelled query: <empty>
>>
>> FuzzySuggester hits: screen, screensaver FuzzySuggester hits on misspelled
>> query: screen,
>> screensaver FuzzySuggester EXACT_FIRST hits: screen FuzzySuggester
>> EXACT_FIRST hits on
>> misspelled query: <empty> => TARGET: screen
>>
>>
>> Is there a possibility to distinguish? I see that the 'exact' criteria
>> relies on an FST
>> aspect 'END_BYTE arc leaving'. Maybe these can be set differently when
>> building the
>> Levenshtein automata? I have no clue.
>
> It seems like the problem is that AnalyzingSuggester checks for exactFirst
> before calling
> .getFullPrefixPaths (which, in FuzzySuggester subclass, applies the
> fuzziness)?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail:
> [email protected] For additional commands, e-mail:
> [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]