[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656728#action_12656728
]
Michael McCandless commented on LUCENE-1483:
--------------------------------------------
OK I ran an initial test, though since the ord approach is a "bit"
buggy we can't be sure how well to trust these results.
I indexed first 2M docs from Wikipedia, into 101 segment index, then
search for "text" (hits 97K results), sort by title, pulling best 100
hits. I do the search 1000 times in each round.
Current trunk (best 107.1 searches/sec):
{code}
Operation round runCnt recsPerRun rec/s elapsedSec
avgUsedMem avgTotalMem
XSearchWarm 0 1 1 0.0 93.64
463,373,760 1,029,046,272
XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - 100.6 - - 9.94 -
463,373,760 1,029,046,272
XSearchWithSort_1000 1 1 1000 107.1 9.34
572,969,344 1,029,046,272
XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - 105.5 - - 9.48 -
572,969,344 1,029,046,272
XSearchWithSort_1000 3 1 1000 106.2 9.41
587,068,928 1,029,046,272
{code}
Patch STRING_ORD (best 102.0 searches/sec):
{code}
Operation round runCnt recsPerRun rec/s elapsedSec
avgUsedMem avgTotalMem
XSearchWarm 0 1 1 0.5 2.16
384,153,600 1,029,046,272
XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - - 94.1 - - 10.63 -
439,173,824 1,029,046,272
XSearchWithSort_1000 1 1 1000 100.7 9.93
439,173,824 1,029,046,272
XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - 101.9 - - 9.81 -
573,822,208 1,029,046,272
XSearchWithSort_1000 3 1 1000 102.0 9.81
573,822,208 1,029,046,272
{code}
Patch STRING_VAL (best 34.6 searches/sec):
{code}
XSearchWarm 0 1 1 0.4 2.24
368,201,088 1,029,046,272
XSearchWithSort_1000 - 0 - - 1 - - - 1000 - - - 34.6 - - 28.94 -
415,107,648 1,029,046,272
XSearchWithSort_1000 1 1 1000 33.9 29.54
415,107,648 1,029,046,272
XSearchWithSort_1000 - 2 - - 1 - - - 1000 - - - 33.9 - - 29.46 -
545,339,904 1,029,046,272
XSearchWithSort_1000 3 1 1000 34.0 29.40
545,339,904 1,029,046,272
{code}
Notes:
* Populating the field cache on trunk for MultiReader is
fantastically costly (94 sec). The IO cache was already hot so
this isn't IO latency. I think MultiTermEnum/Docs behaves badly
for this use case (single unique term (title) per doc). We really
need to switch to column-stride fields, not un-invert, for this.
* For this case at least STRING_ORD is still quite a bit faster than
STRING_VAL; however, it's still buggy. Maybe a smaller queue size
(eg 10 or 20) would make them closer.
* STRING_ORD is still a bit slower than trunk's sort; hopefully once
tuned it'll be closer.
I think we now need to fix the STRING_ORD bug & retest.
> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.9
> Reporter: Mark Miller
> Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing
> for individual segment reloading on reopen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]