[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655616#action_12655616 ]
Michael McCandless commented on LUCENE-1483: -------------------------------------------- OK, I ran a quick perf test on a 100 segment index with 1 million docs (10K docs per segment), for a single TermQuery ("text"), and I'm seeing 11.1% speedup (best of 4: 20.36s -> 18.11s) with this patch, on Mac OS X. On Linux I see 6.3% speedup (best of 4: 23.31s -> 21.84s). Single segment index shows no difference, as expected. I think the speedup is due to avoiding the extra method call plus 2nd pass through the int docs[] to add in the doc base, in MultiSegmentReader.MultiTermDocs.read(int[] docs, int[] freqs). This is a nice "side effect", ie in addition to getting faster reopen performance (the original goal here), we get a bump in single term search performance. I think given this, we should cutover other search methods (sort-by-relevance, custom HitCollector) to this approach? Maybe if we add a new Scorer.score method that can accept a "docBase" which it adds into the doc() before calling collect()? In fact, if we do that, we may not even need the new MultiReaderTopFieldDocCollector at all? Hmm, though, a Scorer may override that score(HitCollector), eg BooleanScorer does. Maybe we have to make a wrapper HitCollector that simply adds in the docBase and then invokes the real HitCollector.collect after shifting the docBase? Though that costs us an extra method call per collect(). Here's the alg I used (slight modified from the one above): {code} merge.factor=1000 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.tokenized=true doc.term.vector=false doc.add.log.step=100000 max.buffered=10000 ram.flush.mb=1000 work.dir = /lucene/work doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker file.query.maker.file = test.queries task.max.depth.log=2 log.queries=true { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 1000000 -CloseIndex } { "Rounds" { "Run" { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 500 CloseReader } NewRound } : 4 } RepSumByPrefRound SearchWithSort {code} It creates the index once, then does 4 rounds of searching with the single query "text" in test.queries (SimpleQueryMaker was creating other queries that were getting 0 or 1 hits). I'm running with "java -Xms1024M -Xmx1024M -Xbatch -server"; java is 1.6.0_07 on Mac Pro OS X 10.5.5 and 1.6.0_10-rc on 2.6.22.1 linux kernel. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > ----------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]