Digy :
Thanks for the code. I made some (primitive) performance tests. I seems to be
~20-25% slower.
Thanks for test. I also run a very basic test.
For default capacity of 1024 SortedDictionary is a little bit slower if
the number of calling Put method (cache.Put(random.Next(iter), new
object()))< 10000 times. It becames about 50% faster if you call Put
method ~5120000 times.
If you increase cache capacity to 2048, SortedDictionary runs faster
about 30% even for PutN~10000.
If you increase cache capacity to 4096, SortedDictionary runs faster
about 100% even for PutN~10000.
From the theory and my test (see tables below) it looks like
SortedList implementation performs as O(capacity). SortedDictionary
performs as O(1) (does not depends on capacity).
capcity=1024
PutN SortedList SortedDictionary
10000 34.2282 51.2173
20000 61.2381 53.2568
40000 149.6492 112.5803
80000 305.0159 238.1283
160000 722.9921 517.4155
320000 1441.9587 964.7425
640000 2750.0694 1904.8576
1280000 5527.8896 3716.7638
2560000 11473.272 7413.9198
5120000 22338.1336 14675.7207
capcity=2048
PutN SortedList SortedDictionary
10000 38.4206 25.7387
20000 89.5416 52.4525
40000 248.9026 114.3436
80000 423.9398 237.9287
160000 1073.6732 498.2754
320000 2095.6281 1036.9142
640000 4409.9442 2072.3321
1280000 9199.8789 4156.8881
2560000 18007.9836 8145.6435
5120000 36166.0517 16050.2191
capcity=4096
PutN SortedList SortedDictionary
10000 57.3052 25.1141
20000 192.4663 46.6878
40000 514.9243 116.4451
80000 1032.0922 263.1914
160000 2423.9152 512.848
320000 5151.876 1032.4438
640000 10333.3283 2155.9971
1280000 20939.848 4406.8069
2560000 42860.7345 8891.0645
5120000 85135.4939 17589.0027
--------
Andrei
DIGY
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 18, 2009 8:19 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in
TermInfosReader term caching (New implementation of SimpleLRUCache)
Digy :
I couldn't find a method like Keys.First().
I found what is the problem. Method First() exists only for Framework
3.5. For Framework 2.0 some work around requied :
SortedDictionary<long, object>.Enumerator
EnumTimeStamps = TimeStamps.GetEnumerator(); // This method is an O(1)
operation
EnumTimeStamps.MoveNext(); // Shoud be an O(1)
operation.
long key =
EnumTimeStamps.Current.Key;
Ugly, but it works.
Why don't you send your own working copy of LRUCache. If it is faster, we can
use it.
I modified SimpleLRUCache_LUCENENET_190 to use a SortedDictionary.
Only few lines of code have to be modified:
1) Declaration:
- SortedList<long, object> TimeStamps = new SortedList<long, object>();
+ System.Collections.Generic.SortedDictionary<long, object> TimeStamps =
new SortedDictionary<long, object>();
2) Put method:
if (Data.Count > Capacity)
{
SortedDictionary<long, object>.Enumerator
EnumTimeStamps = TimeStamps.GetEnumerator();
EnumTimeStamps.MoveNext();
long key =
EnumTimeStamps.Current.Key;
Data.Remove(TimeStamps[key]);
TimeStamps.Remove(key);
}
It passed Nunit test. I didn't make any performance test yet.
----------
Andrei
DIGY
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 18, 2009 5:58 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in
TermInfosReader term caching (New implementation of SimpleLRUCache)
Digy пишет:
This time you will get an error with "TimeStamps.Keys[0]". To get the key at index 0, you
have to copy the "keys" to a temporary array.
Ok i missed that Keys returns KeyCollection instead of an array. But
you can use Keys.First() instead of Keys[0]. It should be fast.
DIGY.
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 18, 2009 5:24 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in
TermInfosReader term caching (New implementation of SimpleLRUCache)
Digy wrote:
SortedDictionary does not have a getter with index (like Data[Index]). In a SortedList, Data[0] is always the LRU item.
I didn't find any use of index access for SortedList (TimeStamps) except
of :
if (Data.Count > Capacity)
{
long key = TimeStamps.Keys[0];
Data.Remove(TimeStamps[key]);
TimeStamps.RemoveAt(0); <-------------
}
In case of SortedDictionary it can be replaced by:
if (Data.Count > Capacity)
{
long key = TimeStamps.Keys[0];
Data.Remove(TimeStamps[key]);
TimeStamps.Remove(key);
}
Am I missed something?
---
Andrei
I got much more better results with RedBlackCS.RedBlack (
http://www.codeproject.com/KB/recipes/redblackcs.aspx ) but it is huge and
there may be licensing problems.
DIGY.
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 18, 2009 4:51 PM
To: [email protected]
Subject: Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in
TermInfosReader term caching (New implementation of SimpleLRUCache)
SimpleLRUCache_LUCENENET_190 uses SortedList<long, object> collection.
Performance of SortedList (see http://msdn.microsoft.com/en-us/library/ms132339.aspx):
1) Add method is an O(n) operation for unsorted data. It is an O(log n) operation if the new element is added at the end of the list.
If insertion causes a resize, the operation is O(n).
2) Remove method method is an O(n) operation
3) RemoveAt method is an O(n) operation
4) Keys property is an O(1) operation
Why not to use SortedDictionary<>? It has better performance for Remove and Add:
1) Add method is an O(log n) operation
2) Remove method is an O(log n) operation
3) Keys property is an O(1) operation
<http://www.garweb.ru>