[jira] [Created] (LUCENE-4481) AnalyzingSuggester may fail to return correct topN suggestions

Michael McCandless (JIRA) Fri, 12 Oct 2012 08:15:04 -0700

Michael McCandless created LUCENE-4481:
------------------------------------------


             Summary: AnalyzingSuggester may fail to return correct topN 
suggestions
                 Key: LUCENE-4481
                 URL: https://issues.apache.org/jira/browse/LUCENE-4481
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Michael McCandless
             Fix For: 4.1, 5.0


I hit this when working on LUCENE-4480.

Because AnalyzingSuggester may prune some of the topN paths found by FST's 
Util.TopNSearcher, this means the queue size limit of topN makes the overall 
search inadmissible, ie it may incorrectly prune paths that would have lead to 
a competitive path.

However, such pruning is rare: it happens only for graph token streams, and 
even then only when competitive analyzed forms share the same surface forms.

The simplest way to fix this is to make the queue unbounded but this is likely 
a sizable performance hit ... I haven't tested yet.  It's even possible the way 
the dups happen (always at the "end" of the suggestion, because we tack on 0 
byte followed by ord dedup byte) prevent this bug from even occurring and so 
this could all be a false alarm!  I have to try to make a test case showing it 
...

A cop-out solution would be to expose a separate queueSize or queueMultiplier 
(over the topN) so that if users are affected by this they could crank up the 
queue size or multiplier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-4481) AnalyzingSuggester may fail to return correct topN suggestions

Reply via email to