[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538112 ]
Ning Li commented on LUCENE-1035: --------------------------------- > I'll change to "OR" queries and see what happens. Query set with average 590K results, retrieving docids for the first 5K Buffer Pool Size Hit Ratio Queries per second 0 N/A 1.9 16M 53% 1.9 32M 68% 2.0 64M 90% 2.3 128M/256M/512M 99% 2.3 As Yonik pointed out, in the previous "AND" tests, the bottleneck is the system call to move data from file system cache to userspace. Here in the "OR" tests, much fewer such calls are made therefore the speedup is less significant. Wish I could get a real query workload for this dataset. > Actually, phrase queries would be really interesting too since they hit the > term positions. Phrase queries are rare and term distribution is highly skewed according to the following study on the Excite query log: Spink, Amanda and Xu, Jack L. (2000) "Selected results from a large study of Web searching: the Excite study". Information Research, 6(1) Available at: http://InformationR.net/ir/6-1/paper90.html "4. Phase Searching: Phrases (terms enclosed by quotation marks) were seldom, while only 1 in 16 queries contained a phrase - but correctly used. 5. Search Terms: Distribution: Jansen, et al., (2000) report the distribution of the frequency of use of terms in queries as highly skewed." I didn't find a good on on the AOL query log. In any case, this buffer pool is not intended for general purpose. I mentioned RAMDirectory earlier. This is more like an alternative to RAMDirectory (that's why it's per directory): you want persistent storage for the index, yet it's not too big that you want RAMDirectory search performance. In addition, the entire index doesn't have to fit into memory, as long as the most queried part does. Hopefully, this benefits a subset of Lucene use cases. > did you compare it against MMAP? I The index I experimented on didn't fit in memory... > Optional Buffer Pool to Improve Search Performance > -------------------------------------------------- > > Key: LUCENE-1035 > URL: https://issues.apache.org/jira/browse/LUCENE-1035 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Reporter: Ning Li > Attachments: LUCENE-1035.patch > > > Index in RAMDirectory provides better performance over that in FSDirectory. > But many indexes cannot fit in memory or applications cannot afford to > spend that much memory on index. On the other hand, because of locality, > a reasonably sized buffer pool may provide good improvement over FSDirectory. > This issue aims at providing such an optional buffer pool layer. In cases > where it fits, i.e. a reasonable hit ratio can be achieved, it should provide > a good improvement over FSDirectory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]