[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659359#action_12659359
 ] 

Mark Miller commented on LUCENE-1483:
-------------------------------------

{quote}
Given how different the results are, depending on how many segments
index has, queue size, how many hits search gets, etc., I think we
need a dynamic solution, meaning in certain situations (many hits,
small queue depth, small number of large segments) you use ORD but
other times you use ORDDEM.
{quote}

Sounds interesting...

{quote}
So I'm thinking setNextReader should return a new comparator? Often
it would simply return itself, but if it deems it worthwhile to switch
eg from ORD to ORDDEM it would switch to ORDDEM and return that.
{quote}

I like that I think. Only other option I see off hand is a comparator that can 
do both, but not as clean and probably adds a check in tightly looped code.

{quote}
Then I also thought of a wild possible change: when doing searching,
it'd be best to visit the segments from largest to smallest, doing ORD
in the beginning and switching to ORDDEM at some point. So, could we
do this? I think we only "require" in-order docs within a segment, so
could we switch up segment order. We'd need to fix setNextReader API
to take in reader & docBase. Would that work?
{quote}

I think this could work well. Since you are likely to have a few large 
segments, ord would be fastest, then as you moved through the many small 
segments, orddem would likely work best. Is largest to smallest best though? 
You do get to map onto smaller term[] arrays as you go, but that causes more 
fallback. You are also likely to be carrying more hits in the queue into the 
nextreader right? From smallest to largest you likely have fewer hits to map as 
you hit the big segments, and more room to fit in for less fallback. So the 
question is, what obvious piece am I missing :)

Largest to smallest, you fill the queue faster earlier. So more to convert as 
you hit all the other segments - but I guess that will be heavily mitigated by 
on demand.. You will convert slot.min, and if nothing beats it, I guess thats 
it...so not so bad actually. And if you go smallest to largest, I guess the 
queue wont be full, so there will be more 'wins' into the queue, which will 
cause more conversions over the small segments...in which case for a ton of 
them and a big queue, largest to smallest seems better. Still feel like I'm 
missing something, but I guess I have convinced myself largest to smallest is 
the way to go.

I'm probably not the first to wish there were more hours in the day...

I'll put up a patch with the better testing soon just in case.





> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to