Digy :
Thanks for the code. I made some (primitive) performance tests. I seems to be 
~20-25% slower.
Thanks for test. I also run a very basic test.
For default capacity of 1024 SortedDictionary is a little bit slower if the number of calling Put method (cache.Put(random.Next(iter), new object()))< 10000 times. It becames about 50% faster if you call Put method ~5120000 times. If you increase cache capacity to 2048, SortedDictionary runs faster about 30% even for PutN~10000. If you increase cache capacity to 4096, SortedDictionary runs faster about 100% even for PutN~10000.

From the theory and my test (see tables below) it looks like SortedList implementation performs as O(capacity). SortedDictionary performs as O(1) (does not depends on capacity).


capcity=1024

PutN SortedList SortedDictionary 10000 34.2282 51.2173 20000 61.2381 53.2568 40000 149.6492 112.5803 80000 305.0159 238.1283 160000 722.9921 517.4155 320000 1441.9587 964.7425 640000 2750.0694 1904.8576 1280000 5527.8896 3716.7638 2560000 11473.272 7413.9198 5120000 22338.1336 14675.7207

capcity=2048

PutN SortedList SortedDictionary 10000 38.4206 25.7387 20000 89.5416 52.4525 40000 248.9026 114.3436 80000 423.9398 237.9287 160000 1073.6732 498.2754 320000 2095.6281 1036.9142 640000 4409.9442 2072.3321 1280000 9199.8789 4156.8881 2560000 18007.9836 8145.6435 5120000 36166.0517 16050.2191
capcity=4096

PutN SortedList SortedDictionary 10000 57.3052 25.1141 20000 192.4663 46.6878 40000 514.9243 116.4451 80000 1032.0922 263.1914 160000 2423.9152 512.848 320000 5151.876 1032.4438 640000 10333.3283 2155.9971 1280000 20939.848 4406.8069 2560000 42860.7345 8891.0645 5120000 85135.4939 17589.0027 --------
Andrei
DIGY

-----Original Message-----
From: [email protected] [mailto:[email protected]] Sent: Tuesday, August 18, 2009 8:19 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in 
TermInfosReader term caching (New implementation of SimpleLRUCache)

Digy :
I couldn't find a method like Keys.First().
I found what is the problem. Method First() exists only for Framework 3.5. For Framework 2.0 some work around requied :

SortedDictionary<long, object>.Enumerator EnumTimeStamps = TimeStamps.GetEnumerator(); // This method is an O(1) operation EnumTimeStamps.MoveNext(); // Shoud be an O(1) operation. long key = EnumTimeStamps.Current.Key;
Ugly, but it works.

Why don't you send your own working copy of LRUCache. If it is faster, we can 
use it.
I modified SimpleLRUCache_LUCENENET_190 to use a SortedDictionary.

Only few lines of code have to be modified:
1) Declaration:
-  SortedList<long, object> TimeStamps = new SortedList<long, object>();
+ System.Collections.Generic.SortedDictionary<long, object> TimeStamps = new SortedDictionary<long, object>();
2) Put method:
if (Data.Count > Capacity)
                {
SortedDictionary<long, object>.Enumerator EnumTimeStamps = TimeStamps.GetEnumerator();
                    EnumTimeStamps.MoveNext();
long key = EnumTimeStamps.Current.Key; Data.Remove(TimeStamps[key]);
                    TimeStamps.Remove(key);
                }


It passed Nunit test. I didn't make any performance test yet.

----------
Andrei
DIGY


-----Original Message-----
From: [email protected] [mailto:[email protected]] Sent: Tuesday, August 18, 2009 5:58 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in 
TermInfosReader term caching (New implementation of SimpleLRUCache)

Digy пишет:
This time you will get an error with "TimeStamps.Keys[0]". To get the key at index 0, you 
have to copy the "keys" to a temporary array.

Ok i missed that Keys returns KeyCollection instead of an array. But you can use Keys.First() instead of Keys[0]. It should be fast.


DIGY.


-----Original Message-----
From: [email protected] [mailto:[email protected]] Sent: Tuesday, August 18, 2009 5:24 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in 
TermInfosReader term caching (New implementation of SimpleLRUCache)

Digy wrote:
SortedDictionary does not have a getter with index (like Data[Index]). In a SortedList, Data[0] is always the LRU item.
I didn't find any use of index access for SortedList (TimeStamps) except of : if (Data.Count > Capacity)
                {
                    long key = TimeStamps.Keys[0];
                    Data.Remove(TimeStamps[key]);
                    TimeStamps.RemoveAt(0); <-------------
                }

 In case of  SortedDictionary it can be replaced by:
                if (Data.Count > Capacity)
                {
                    long key = TimeStamps.Keys[0];
                    Data.Remove(TimeStamps[key]);
                    TimeStamps.Remove(key);
                }

Am I missed something?

---
Andrei
I got much more better results with RedBlackCS.RedBlack ( 
http://www.codeproject.com/KB/recipes/redblackcs.aspx ) but it is huge and 
there may be licensing problems.


DIGY.


-----Original Message-----
From: [email protected] [mailto:[email protected]] Sent: Tuesday, August 18, 2009 4:51 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in 
TermInfosReader term caching (New implementation of SimpleLRUCache)

SimpleLRUCache_LUCENENET_190 uses SortedList<long, object> collection.

Performance of SortedList (see http://msdn.microsoft.com/en-us/library/ms132339.aspx): 1) Add method is an O(n) operation for unsorted data. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).
2) Remove method method is an O(n) operation
3) RemoveAt method is an O(n) operation 4) Keys property is an O(1) operation

Why not to use SortedDictionary<>? It has better performance for Remove and Add:

1) Add method is an O(log n) operation

2) Remove method is an O(log n) operation

3) Keys property is an O(1) operation


<http://www.garweb.ru>







Reply via email to