[ https://issues.apache.org/jira/browse/LUCENE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480366#comment-13480366 ]
Michael McCandless commented on LUCENE-4481: -------------------------------------------- {quote} Though this is a pretty big hit (i look at prefixes 2-4), lets commit the fix for the bug first and then go back around to optimizations. {quote} I agree -- I'll make two commits here: first fixing the bugs, then adding some optos back. The optos don't fully recover the perf loss but they get much of it back ... > AnalyzingSuggester may fail to return correct topN suggestions > -------------------------------------------------------------- > > Key: LUCENE-4481 > URL: https://issues.apache.org/jira/browse/LUCENE-4481 > Project: Lucene - Core > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4481.patch, LUCENE-4481.patch, LUCENE-4481.patch, > LUCENE-4481.patch > > > I hit this when working on LUCENE-4480. > Because AnalyzingSuggester may prune some of the topN paths found by FST's > Util.TopNSearcher, this means the queue size limit of topN makes the overall > search inadmissible, ie it may incorrectly prune paths that would have lead > to a competitive path. > However, such pruning is rare: it happens only for graph token streams, and > even then only when competitive analyzed forms share the same surface forms. > The simplest way to fix this is to make the queue unbounded but this is > likely a sizable performance hit ... I haven't tested yet. It's even > possible the way the dups happen (always at the "end" of the suggestion, > because we tack on 0 byte followed by ord dedup byte) prevent this bug from > even occurring and so this could all be a false alarm! I have to try to make > a test case showing it ... > A cop-out solution would be to expose a separate queueSize or queueMultiplier > (over the topN) so that if users are affected by this they could crank up the > queue size or multiplier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org