[ 
https://issues.apache.org/jira/browse/LUCENE-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4481:
---------------------------------------

    Attachment: LUCENE-4481.patch

OK, new patch, this time adding back some optos:

{noformat}
[junit4:junit4] Suite: org.apache.lucene.search.suggest.LookupBenchmarkTest
[junit4:junit4]   2> -- RAM consumption
[junit4:junit4]   2> JaspellLookup   size[B]:    9,815,152
[junit4:junit4]   2> TSTLookup       size[B]:    9,858,792
[junit4:junit4]   2> FSTCompletionLookup size[B]:      466,520
[junit4:junit4]   2> WFSTCompletionLookup size[B]:      507,640
[junit4:junit4]   2> AnalyzingSuggester size[B]:      889,138
[junit4:junit4] OK      1.67s | LookupBenchmarkTest.testStorageNeeds
[junit4:junit4]   2> -- prefixes: 6-9, num: 7, onlyMorePopular: true
[junit4:junit4]   2> JaspellLookup   queries: 50001, time[ms]: 108 [+- 8.81], 
~kQPS: 464
[junit4:junit4]   2> TSTLookup       queries: 50001, time[ms]: 79 [+- 1.07], 
~kQPS: 631
[junit4:junit4]   2> FSTCompletionLookup queries: 50001, time[ms]: 148 [+- 
2.54], ~kQPS: 339
[junit4:junit4]   2> WFSTCompletionLookup queries: 50001, time[ms]: 67 [+- 
2.78], ~kQPS: 745
[junit4:junit4]   2> AnalyzingSuggester queries: 50001, time[ms]: 260 [+- 
3.92], ~kQPS: 192
[junit4:junit4] OK      14.6s | LookupBenchmarkTest.testPerformanceOnPrefixes6_9
[junit4:junit4]   2> -- prefixes: 2-4, num: 7, onlyMorePopular: true
[junit4:junit4]   2> JaspellLookup   queries: 50001, time[ms]: 262 [+- 5.16], 
~kQPS: 191
[junit4:junit4]   2> TSTLookup       queries: 50001, time[ms]: 641 [+- 6.46], 
~kQPS: 78
[junit4:junit4]   2> FSTCompletionLookup queries: 50001, time[ms]: 118 [+- 
2.95], ~kQPS: 424
[junit4:junit4]   2> WFSTCompletionLookup queries: 50001, time[ms]: 239 [+- 
4.84], ~kQPS: 210
[junit4:junit4]   2> AnalyzingSuggester queries: 50001, time[ms]: 660 [+- 
7.39], ~kQPS: 76
[junit4:junit4] OK      39.0s | LookupBenchmarkTest.testPerformanceOnPrefixes2_4
[junit4:junit4]   2> -- construction time
[junit4:junit4]   2> JaspellLookup   input: 50001, time[ms]: 23 [+- 4.20]
[junit4:junit4]   2> TSTLookup       input: 50001, time[ms]: 64 [+- 2.06]
[junit4:junit4]   2> FSTCompletionLookup input: 50001, time[ms]: 120 [+- 2.11]
[junit4:junit4]   2> WFSTCompletionLookup input: 50001, time[ms]: 88 [+- 1.09]
[junit4:junit4]   2> AnalyzingSuggester input: 50001, time[ms]: 245 [+- 27.85]
[junit4:junit4] OK      10.9s | LookupBenchmarkTest.testConstructionTime
[junit4:junit4]   2> -- prefixes: 100-200, num: 7, onlyMorePopular: true
[junit4:junit4]   2> JaspellLookup   queries: 50001, time[ms]: 68 [+- 1.17], 
~kQPS: 731
[junit4:junit4]   2> TSTLookup       queries: 50001, time[ms]: 31 [+- 2.82], 
~kQPS: 1617
[junit4:junit4]   2> FSTCompletionLookup queries: 50001, time[ms]: 141 [+- 
1.97], ~kQPS: 354
[junit4:junit4]   2> WFSTCompletionLookup queries: 50001, time[ms]: 45 [+- 
3.37], ~kQPS: 1099
[junit4:junit4]   2> AnalyzingSuggester queries: 50001, time[ms]: 233 [+- 
4.02], ~kQPS: 215
[junit4:junit4] OK      11.1s | LookupBenchmarkTest.testPerformanceOnFullHits
[junit4:junit4] Completed in 77.54s, 5 tests
{noformat}

I added 2nd param (maxQueueDepth) to TopNSearcher, and fixed
WFSTSuggester to pass topN for that (should get back most of its
perf).  I also fixed AnalyzingSuggester: we can bound how big a queue
we need by the worst case number of analyzed forms for a single
surface form.  This is nice because if the analyzed doesn't create a
graph then we should have close to same perf as before.

                
> AnalyzingSuggester may fail to return correct topN suggestions
> --------------------------------------------------------------
>
>                 Key: LUCENE-4481
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4481
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4481.patch, LUCENE-4481.patch, LUCENE-4481.patch, 
> LUCENE-4481.patch
>
>
> I hit this when working on LUCENE-4480.
> Because AnalyzingSuggester may prune some of the topN paths found by FST's 
> Util.TopNSearcher, this means the queue size limit of topN makes the overall 
> search inadmissible, ie it may incorrectly prune paths that would have lead 
> to a competitive path.
> However, such pruning is rare: it happens only for graph token streams, and 
> even then only when competitive analyzed forms share the same surface forms.
> The simplest way to fix this is to make the queue unbounded but this is 
> likely a sizable performance hit ... I haven't tested yet.  It's even 
> possible the way the dups happen (always at the "end" of the suggestion, 
> because we tack on 0 byte followed by ord dedup byte) prevent this bug from 
> even occurring and so this could all be a false alarm!  I have to try to make 
> a test case showing it ...
> A cop-out solution would be to expose a separate queueSize or queueMultiplier 
> (over the topN) so that if users are affected by this they could crank up the 
> queue size or multiplier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to