SortedDictionary does not have a getter with index (like Data[Index]). In a 
SortedList, Data[0] is always the LRU item. 


I got much more better results with RedBlackCS.RedBlack ( 
http://www.codeproject.com/KB/recipes/redblackcs.aspx ) but it is huge and 
there may be licensing problems.


DIGY.


-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: Tuesday, August 18, 2009 4:51 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in 
TermInfosReader term caching (New implementation of SimpleLRUCache)

SimpleLRUCache_LUCENENET_190 uses SortedList<long, object> collection.

Performance of SortedList (see 
http://msdn.microsoft.com/en-us/library/ms132339.aspx): 
1) Add method is an O(n) operation for unsorted data. It is an O(log n) 
operation if the new element is added at the end of the list. 
If insertion causes a resize, the operation is O(n).
2) Remove method method is an O(n) operation
3) RemoveAt method is an O(n) operation 
4) Keys property is an O(1) operation

Why not to use SortedDictionary<>? It has better performance for Remove and Add:

1) Add method is an O(log n) operation

2) Remove method is an O(log n) operation

3) Keys property is an O(1) operation


-- 
Iliev Andrei


Digy (JIRA) :
>      [ 
> https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Digy updated LUCENENET-190:
> ---------------------------
>
>     Attachment: SimpleLRUCache.rar
>
> A slightly faster implementation + a test case for SimpleLRUCache.
>
> DIGY
>
>   
>> 2.4.0 Performance in TermInfosReader term caching (New implementation of 
>> SimpleLRUCache)
>> ----------------------------------------------------------------------------------------
>>
>>                 Key: LUCENENET-190
>>                 URL: https://issues.apache.org/jira/browse/LUCENENET-190
>>             Project: Lucene.Net
>>          Issue Type: Improvement
>>         Environment: v2.4.0
>>            Reporter: Digy
>>            Priority: Minor
>>         Attachments: cache_Gen2.PNG, SimpleLRUCache.rar, TermInfosReader.rar
>>
>>
>> Below is the mail from Michael Garski about the Performance in 
>> TermInfosReader term caching. It would be good to have a faster LRUCache 
>> implementation in Lucene.Net
>> DIGY
>> {quote}
>> Doug did an amazing job of porting 2.4.0, doing it mostly on his own!  
>> Hooray Doug!
>> We are using the committed version of 2.4.0 in production and I wanted to 
>> share a performance issue we discovered and what we've done to work around 
>> it.  From the Java Lucene change log:  "LUCENE-1195: Improve term lookup 
>> performance by adding a LRU cache to the TermInfosReader. In performance 
>> experiments the speedup was about 25% on average on mid-size indexes with 
>> ~500,000 documents for queries with 3 terms and about 7% on larger indexes 
>> with ~4.3M documents."
>> The Java implementation uses a LinkedHashMap within the class 
>> org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at 
>> maintaining the cache.  As there is no equivalent collection in .Net The 
>> current 2.4.0 port uses a combination of a LinkedList to maintain LRU state 
>> and a HashTable to provide lookups.  While this implementation works, 
>> maintaining the LRU state via the LinkedList creates a fair amount of 
>> overhead and can result in a significant reduction of performance, most 
>> likely attributed to the LinkedList.Remove method being O(n).  As each 
>> thread maintains its own cache of 1024 terms, these overhead in performing 
>> the removal is a drain on performance.
>> At this time we have disabled the cache in the method 
>> TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the 
>> useCache parameter to false inside the body of the method.  After doing this 
>> we saw performance return back to the 2.3.2 levels.  I have not yet had the 
>> opportunity to experiment with other implementations within the 
>> SimpleLRUCache to address the performance issue.  One approach that would 
>> might solve the issue is to use the HashedLinkedList<T> class provided in 
>> the C5 collection library [http://www.itu.dk/research/c5/].
>> Michael
>> Michael Garski
>> Search Architect
>> MySpace.com
>> www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
>> {quote}
>>     
>
>   


Reply via email to