[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659359#action_12659359 ]
Mark Miller commented on LUCENE-1483: ------------------------------------- {quote} Given how different the results are, depending on how many segments index has, queue size, how many hits search gets, etc., I think we need a dynamic solution, meaning in certain situations (many hits, small queue depth, small number of large segments) you use ORD but other times you use ORDDEM. {quote} Sounds interesting... {quote} So I'm thinking setNextReader should return a new comparator? Often it would simply return itself, but if it deems it worthwhile to switch eg from ORD to ORDDEM it would switch to ORDDEM and return that. {quote} I like that I think. Only other option I see off hand is a comparator that can do both, but not as clean and probably adds a check in tightly looped code. {quote} Then I also thought of a wild possible change: when doing searching, it'd be best to visit the segments from largest to smallest, doing ORD in the beginning and switching to ORDDEM at some point. So, could we do this? I think we only "require" in-order docs within a segment, so could we switch up segment order. We'd need to fix setNextReader API to take in reader & docBase. Would that work? {quote} I think this could work well. Since you are likely to have a few large segments, ord would be fastest, then as you moved through the many small segments, orddem would likely work best. Is largest to smallest best though? You do get to map onto smaller term[] arrays as you go, but that causes more fallback. You are also likely to be carrying more hits in the queue into the nextreader right? From smallest to largest you likely have fewer hits to map as you hit the big segments, and more room to fit in for less fallback. So the question is, what obvious piece am I missing :) Largest to smallest, you fill the queue faster earlier. So more to convert as you hit all the other segments - but I guess that will be heavily mitigated by on demand.. You will convert slot.min, and if nothing beats it, I guess thats it...so not so bad actually. And if you go smallest to largest, I guess the queue wont be full, so there will be more 'wins' into the queue, which will cause more conversions over the small segments...in which case for a ton of them and a big queue, largest to smallest seems better. Still feel like I'm missing something, but I guess I have convinced myself largest to smallest is the way to go. I'm probably not the first to wish there were more hours in the day... I'll put up a patch with the better testing soon just in case. > Change IndexSearcher to use MultiSearcher semantics for multiple subreaders > --------------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, sortBench.py, sortCollate.py > > > FieldCache and Filters are forced down to a single segment reader, allowing > for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org