2.4.0 Performance in TermInfosReader term caching (New implementation of
SimpleLRUCache)
----------------------------------------------------------------------------------------
Key: LUCENENET-190
URL: https://issues.apache.org/jira/browse/LUCENENET-190
Project: Lucene.Net
Issue Type: Improvement
Environment: v2.4.0
Reporter: Digy
Priority: Minor
Below is the mail from Michael Garski about the Performance in TermInfosReader
term caching. It would be good to have a faster LRUCache implementation in
Lucene.Net
DIGY
{quote}
Doug did an amazing job of porting 2.4.0, doing it mostly on his own!
Hooray Doug!
We are using the committed version of 2.4.0 in production and I wanted to share
a performance issue we discovered and what we've done to work around it. From
the Java Lucene change log: "LUCENE-1195: Improve term lookup performance by
adding a LRU cache to the TermInfosReader. In performance experiments the
speedup was about 25% on average on mid-size indexes with ~500,000 documents
for queries with 3 terms and about 7% on larger indexes with ~4.3M documents."
The Java implementation uses a LinkedHashMap within the class
org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at
maintaining the cache. As there is no equivalent collection in .Net The
current 2.4.0 port uses a combination of a LinkedList to maintain LRU state and
a HashTable to provide lookups. While this implementation works, maintaining
the LRU state via the LinkedList creates a fair amount of overhead and can
result in a significant reduction of performance, most likely attributed to the
LinkedList.Remove method being O(n). As each thread maintains its own cache of
1024 terms, these overhead in performing the removal is a drain on performance.
At this time we have disabled the cache in the method TermInfosReader.TermInfo
Get(Term term, bool useCache) by always setting the useCache parameter to false
inside the body of the method. After doing this we saw performance return back
to the 2.3.2 levels. I have not yet had the opportunity to experiment with
other implementations within the SimpleLRUCache to address the performance
issue. One approach that would might solve the issue is to use the
HashedLinkedList<T> class provided in the C5 collection library
[http://www.itu.dk/research/c5/].
Michael
Michael Garski
Search Architect
MySpace.com
www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
{quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.