[ https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614974#action_12614974 ]
Michael McCandless commented on LUCENE-753: ------------------------------------------- I created a large index (indexed Wikipedia 4X times over, with stored fields & tv offsets/positions = 72 GB). I then randomly sampled 50 terms > 1 million freq, plus 200 terms > 100,000 freq plus 100 terms > 10,000 freq plus 100 terms > 1000 freq. Then I warmed the OS so these queries are fully cached in the IO cache. It's a highly synthetic test. I'd really love to test on real queries, instead of single term queries. Then I ran this alg: {code} analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer query.maker = org.apache.lucene.benchmark.byTask.feeds.FileBasedQueryMaker file.query.maker.file = /lucene/wikiQueries.txt directory=FSDirectory pool=true work.dir=/lucene/bigwork OpenReader { "Warmup" SearchTrav(20) > : 5 { "Rounds" [{ "Search" Search > : 500]: 16 NewRound }: 2 CloseReader RepSumByPrefRound Search {code} I ran with 2, 4, 8 and 16 threads, on a Intel quad Mac Pro (2 cpus, each dual core) OS X 10.5.4, with 6 GB RAM, Sun JRE 1.6.0_05 and a single WD Velociraptor hard drive. To keep the number of searches constant I changed the 500 count above to match (ie with 8 threads I changed 500 -> 1000, 4 threads I changed it to 2000, etc.). Here're the results -- each run is best of 2, and all searches are fully cached in OS's IO cache: ||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain|| |2|78.7|74.9|5.1%| |4|74.1|68.2|8.7%| |8|37.7|32.7|15.3%| |16|19.2|16.3|17.8%| I also ran the same alg, replacing Search task with SearchTravRet(10) (retrieves the first 10 docs (hits) of each search), first warming so it's all fully cached: ||Number of Threads||Patch rec/s||Trunk rec/s||Pctg gain|| |2|1589.6|1519.8|4.6%| |4|1460.9|1395.3|4.7%| |8|748.9|676.0|10.8%| |16|382.7|338.4|13.1%| So there are smallish gains, but rememember these are upper bounds on the gains because no pooling is happening. I'll test uncached next. > Use NIO positional read to avoid synchronization in FSIndexInput > ---------------------------------------------------------------- > > Key: LUCENE-753 > URL: https://issues.apache.org/jira/browse/LUCENE-753 > Project: Lucene - Java > Issue Type: New Feature > Components: Store > Reporter: Yonik Seeley > Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, > FileReadTest.java, FileReadTest.java, FileReadTest.java, FileReadTest.java, > FSDirectoryPool.patch, FSIndexInput.patch, FSIndexInput.patch, > lucene-753.patch, lucene-753.patch > > > As suggested by Doug, we could use NIO pread to avoid synchronization on the > underlying file. > This could mitigate any MT performance drop caused by reducing the number of > files in the index format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]