[
https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607085#comment-13607085
]
Michael McCandless commented on LUCENE-4845:
--------------------------------------------
I like this approach! (Add epsilon transitions after the automaton is built).
I managed to build the FreeDB suggest using this but ... it required a lot of
RAM: it OOM'd at 14 GB heap but finished successfully at 20 GB heap.
Took a longish time to build too, and made a biggish FST (more than 2X larger
than the index):
* 2466 sec to build
* FST is 8.6 GB
* Prefix 2: 2527.5 lookups/sec
* Prefix 4: 1681.7 lookups/sec
* Prefix 6: 1948.3 lookups/sec
* Prefix 8: 2050.9 lookups/sec
* Prefix 10: 2076.0 lookups/sec
We should try the N prefix limit ... but I don't really like that. Maybe we
should just offer both approaches ...
> Add AnalyzingInfixSuggester
> ---------------------------
>
> Key: LUCENE-4845
> URL: https://issues.apache.org/jira/browse/LUCENE-4845
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spellchecker
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.3
>
> Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch,
> LUCENE-4845.patch
>
>
> Our current suggester impls do prefix matching of the incoming text
> against all compiled suggestions, but in some cases it's useful to
> allow infix matching. E.g, Netflix does infix suggestions in their
> search box.
> I did a straightforward impl, just using a normal Lucene index, and
> using PostingsHighlighter to highlight matching tokens in the
> suggestions.
> I think this likely only works well when your suggestions have a
> strong prior ranking (weight input to build), eg Netflix knows
> the popularity of movies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]