[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch updated LUCENE-1195: ---------------------------------- Attachment: lucene-1195.patch In the previous patch was a silly thread-safety problem that I fixed now. Some threads in the TestIndexReaderReopen test occasionally hit errors (I fixed the testcase to fail now whenever an error is hit). I made some other changes to the TermInfosReader. I'm not using two ThreadLocals anymore for the SegmentTermEnum and Cache, but added a small inner class called ThreadResources which holds references to those two objects. I also minimized the amount of ThreadLocal.get() calls by passing around the enumerator. Furthermore I got rid of the private scanEnum() method and inlined it into the get() method to fix the above mentioned thread-safety problem. And I also realized that the cache itself does not have to be thread-safe, because we put it into a ThreadLocal. I reran the same performance test that I ran for the first patch and this version seems to be even faster: 107secs vs. 112secs with the first patch (~30% improvement compared to trunk, 152secs). All tests pass, including the improved TestIndexReaderReopen.testThreadSafety(), which I ran multiple times. OK I think this patch is ready now, I'm planning to commit it in a day or so. > Performance improvement for TermInfosReader > ------------------------------------------- > > Key: LUCENE-1195 > URL: https://issues.apache.org/jira/browse/LUCENE-1195 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: 2.4 > > Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch > > > Currently we have a bottleneck for multi-term queries: the dictionary lookup > is being done > twice for each term. The first time in Similarity.idf(), where > searcher.docFreq() is called. > The second time when the posting list is opened (TermDocs or TermPositions). > The dictionary lookup is not cheap, that's why a significant performance > improvement is > possible here if we avoid the second lookup. An easy way to do this is to add > a small LRU > cache to TermInfosReader. > I ran some performance experiments with an LRU cache size of 20, and an > mid-size index of > 500,000 documents from wikipedia. Here are some test results: > 50,000 AND queries with 3 terms each: > old: 152 secs > new (with LRU cache): 112 secs (26% faster) > 50,000 OR queries with 3 terms each: > old: 175 secs > new (with LRU cache): 133 secs (24% faster) > For bigger indexes this patch will probably have less impact, for smaller > once more. > I will attach a patch soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]