[
https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741587#action_12741587
]
Digy commented on LUCENENET-190:
--------------------------------
Hi Michael,
{quote}
One other thing I noticed is that the use of the incrementing long TimeStamp
value could lead to collisions in the SortedList, resulting in an incorrect LRU
state in the cache.
{quote}
1 - Its incremented with every *Get* or *Put* . So no collision can occur.
{quote}
In the patch you provided, a LinkedList is no longer used but rather a
SortedList, however both the Add and Remove methods are O[n]
{quote}
2 - No. its O[logN] since binary search can be used.
3 - About the performance issues: After I posted this code I compared it with
the code in the current trunk.
{code}
void TestCacheSpeed(Lucene.Net.Util.Cache.Cache cache)
{
Random rnd = new Random(0);
int loopCount = 100000;
int DistinctTerms = 500000;
long start = DateTime.Now.Ticks;
for (int i = 0; i < loopCount; i++)
{
//cache.Put(i, i);
//cache.Get(i);
int key = rnd.Next(DistinctTerms);
cache.Put(key, rnd.Next(DistinctTerms));
cache.Get(rnd.Next(DistinctTerms));
cache.Get(rnd.Next(DistinctTerms));
cache.Get(key);
cache.Get(rnd.Next(DistinctTerms));
cache.Get(rnd.Next(DistinctTerms));
}
long end = DateTime.Now.Ticks;
rtf.AppendText(((float)loopCount) / ((end - start) / 10000) + "
loops/msec. \n");
}
{code}
Yes, it is a dummy test case, but, after playing with some parameters(# of
puts or gets, capacity, hit ratio etc.), there was a speed up from 50 to 200
times. (Yes It was 200 times faster, not joking).
DIGY
> 2.4.0 Performance in TermInfosReader term caching (New implementation of
> SimpleLRUCache)
> ----------------------------------------------------------------------------------------
>
> Key: LUCENENET-190
> URL: https://issues.apache.org/jira/browse/LUCENENET-190
> Project: Lucene.Net
> Issue Type: Improvement
> Environment: v2.4.0
> Reporter: Digy
> Priority: Minor
> Attachments: SimpleLRUCache.rar
>
>
> Below is the mail from Michael Garski about the Performance in
> TermInfosReader term caching. It would be good to have a faster LRUCache
> implementation in Lucene.Net
> DIGY
> {quote}
> Doug did an amazing job of porting 2.4.0, doing it mostly on his own!
> Hooray Doug!
> We are using the committed version of 2.4.0 in production and I wanted to
> share a performance issue we discovered and what we've done to work around
> it. From the Java Lucene change log: "LUCENE-1195: Improve term lookup
> performance by adding a LRU cache to the TermInfosReader. In performance
> experiments the speedup was about 25% on average on mid-size indexes with
> ~500,000 documents for queries with 3 terms and about 7% on larger indexes
> with ~4.3M documents."
> The Java implementation uses a LinkedHashMap within the class
> org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at
> maintaining the cache. As there is no equivalent collection in .Net The
> current 2.4.0 port uses a combination of a LinkedList to maintain LRU state
> and a HashTable to provide lookups. While this implementation works,
> maintaining the LRU state via the LinkedList creates a fair amount of
> overhead and can result in a significant reduction of performance, most
> likely attributed to the LinkedList.Remove method being O(n). As each thread
> maintains its own cache of 1024 terms, these overhead in performing the
> removal is a drain on performance.
> At this time we have disabled the cache in the method
> TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the
> useCache parameter to false inside the body of the method. After doing this
> we saw performance return back to the 2.3.2 levels. I have not yet had the
> opportunity to experiment with other implementations within the
> SimpleLRUCache to address the performance issue. One approach that would
> might solve the issue is to use the HashedLinkedList<T> class provided in the
> C5 collection library [http://www.itu.dk/research/c5/].
> Michael
> Michael Garski
> Search Architect
> MySpace.com
> www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
> {quote}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.