[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Michael McCandless (JIRA) Mon, 15 Dec 2008 11:56:06 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656728#action_12656728
 ]


Michael McCandless commented on LUCENE-1483:
--------------------------------------------


OK I ran an initial test, though since the ord approach is a "bit"
buggy we can't be sure how well to trust these results.

I indexed first 2M docs from Wikipedia, into 101 segment index, then
search for "text" (hits 97K results), sort by title, pulling best 100
hits.  I do the search 1000 times in each round.

Current trunk (best 107.1 searches/sec):
{code}
Operation            round   runCnt   recsPerRun        rec/s  elapsedSec    
avgUsedMem    avgTotalMem
XSearchWarm              0        1            1          0.0       93.64   
463,373,760  1,029,046,272
XSearchWithSort_1000 -   0 -  -   1 -  -  - 1000 -  -   100.6 -  -   9.94 - 
463,373,760  1,029,046,272
XSearchWithSort_1000     1        1         1000        107.1        9.34   
572,969,344  1,029,046,272
XSearchWithSort_1000 -   2 -  -   1 -  -  - 1000 -  -   105.5 -  -   9.48 - 
572,969,344  1,029,046,272
XSearchWithSort_1000     3        1         1000        106.2        9.41   
587,068,928  1,029,046,272
{code}

Patch STRING_ORD (best 102.0 searches/sec):
{code}
Operation            round   runCnt   recsPerRun        rec/s  elapsedSec    
avgUsedMem    avgTotalMem
XSearchWarm              0        1            1          0.5        2.16   
384,153,600  1,029,046,272
XSearchWithSort_1000 -   0 -  -   1 -  -  - 1000 -  -  - 94.1 -  -  10.63 - 
439,173,824  1,029,046,272
XSearchWithSort_1000     1        1         1000        100.7        9.93   
439,173,824  1,029,046,272
XSearchWithSort_1000 -   2 -  -   1 -  -  - 1000 -  -   101.9 -  -   9.81 - 
573,822,208  1,029,046,272
XSearchWithSort_1000     3        1         1000        102.0        9.81   
573,822,208  1,029,046,272
{code}

Patch STRING_VAL (best 34.6 searches/sec):
{code}
XSearchWarm              0        1            1          0.4        2.24   
368,201,088  1,029,046,272
XSearchWithSort_1000 -   0 -  -   1 -  -  - 1000 -  -  - 34.6 -  -  28.94 - 
415,107,648  1,029,046,272
XSearchWithSort_1000     1        1         1000         33.9       29.54   
415,107,648  1,029,046,272
XSearchWithSort_1000 -   2 -  -   1 -  -  - 1000 -  -  - 33.9 -  -  29.46 - 
545,339,904  1,029,046,272
XSearchWithSort_1000     3        1         1000         34.0       29.40   
545,339,904  1,029,046,272
{code}


Notes:

  * Populating the field cache on trunk for MultiReader is
    fantastically costly (94 sec).  The IO cache was already hot so
    this isn't IO latency.  I think MultiTermEnum/Docs behaves badly
    for this use case (single unique term (title) per doc).  We really
    need to switch to column-stride fields, not un-invert, for this.

  * For this case at least STRING_ORD is still quite a bit faster than
    STRING_VAL; however, it's still buggy.  Maybe a smaller queue size
    (eg 10 or 20) would make them closer.

  * STRING_ORD is still a bit slower than trunk's sort; hopefully once
    tuned it'll be closer.

I think we now need to fix the STRING_ORD bug & retest.


> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Reply via email to