Hi Well, I have results from a 1M index for now (the index contains 100K documents duplicated 10 times, so it's not the final test I'll run, but it still shows something). I ran 2000 short queries (2.4 keywords on average) on a 1M docs index, after 50 queries warm-up. Following are the results:
Current TopDocCollector: ------------------------------------ num queries: 2000 numDocs=1000000 total time: 15910 ms avg time: 7.955 ms avg. allocations: 79.7445 total allocation time: 0 avg. num results: 54841.26 Modified TopDocCollector: ------------------------------------- num queries: 2000 numDocs=1000000 total time: 15909 ms avg time: 7.9545 ms avg. allocations: 9.8565 total allocation time: 0 avg. num results: 54841.26 As you can see, the actual allocation time is really negligible and there isn't much difference in the avg. running times of the queries. However, the *current* runs performed a lot worse at the beginning, before the OS cache warmed up. The only significant difference is the number of allocations - the modified TDC and PQ allocate ~90% (!) less objects. This is significant, especially in heavy loaded systems. Tomorrow I will run the same test on a 10M (unique) documents index, but I think the picture is getting clear ... On Dec 10, 2007 6:15 PM, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Monday 10 December 2007 09:19:43 Shai Erera wrote: > > I agree w.r.t the current implementation, however in the worse case (as > we > > tend to consider when talking about computer algorithms), it will > allocate a > > ScoreDoc per result. With the overflow reuse, it will not allocate those > > objects, no matter what's the input it gets. > > Also, notice that there is a performance hit with the current > implementation > > of TopDocCollector: it first checks that the document should be inserted > and > > if so, PQ does the same check again. > > So, if you change the current implementation to always attempt an > insert, > > you'd gain some performance improvements as well. > > One could consider a specialized priority queue, somewhat like > ScorerDocQueue, with an insert operation using the arguments > as in the collect() call instead a ScoreDoc object. > I don't know where HitQueue is currently used, but that could be > the place to inline PriorityQueue instead of inheriting from it. > Otherwise TopDocCollector could be used for inlining PriorityQueue. > > Although practical tests are always good, an extra performance test > with the worst case of slowly increasing score values should be quite > helpful for comparing with the current implementation and to get the last > bits of performance out of this. > > Regards, > Paul Elschot > > > > > > On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > > The current TopDocCollector only allocates a ScoreDoc when the given > > > score causes a new ScoreDoc to be added into the queue, but it does > > > not reuse anything that overflows out of the queue. > > > So, reusing the overflow out of the queue should reduce object > > > allocations. especially for indexes that tend to have better scoring > > > docs at the end. I wouldn't expect a 30% improvement out of that, > > > but it would help, if only to reduce occasional performance > > > deteriorations. > > > > > > Regards, > > > Paul Elschot > > > > > > > > > On Monday 10 December 2007 08:11:50 Shai Erera wrote: > > > > Hi > > > > > > > > Lucene's PQ implements two methods: put (assumes the PQ has room for > the > > > > object) and insert (checks whether the object can be inserted etc.). > The > > > > implementation of insert() requires the application that uses it to > > > allocate > > > > a new object every time it calls insert. Specifically, it cannot > reuse > > > the > > > > objects that were removed from the PQ. > > > > I've done some measurements on the performance that search would > gain by > > > > using that method, through the change of TopDocCollector. > > > > > > > > PriorityQueue change (added insertWithOverflow method) > > > > > > > > ----------------------------------------------------------------------------------- > > > > /** > > > > * insertWithOverflow() is similar to insert(), except its > return > > > value: > > > > it > > > > * returns the object (if any) that was dropped off the heap > because > > > it > > > > was > > > > * full. This can be the given parameter (in case it is smaller > than > > > the > > > > * full heap's minimum, and couldn't be added) or another object > > > that > > > > was > > > > * previously the smallest value in the heap and now has been > > > replaced > > > > by a > > > > * larger one. > > > > */ > > > > public Object insertWithOverflow(Object element) { > > > > if (size < maxSize) { > > > > put(element); > > > > return null; > > > > } else if (size > 0 && !lessThan(element, top())) { > > > > Object ret = heap[1]; > > > > heap[1] = element; > > > > adjustTop(); > > > > return ret; > > > > } else { > > > > return element; > > > > } > > > > } > > > > [Very similar to insert(), only it returns the object that was > kicked > > > out of > > > > the Queue, or null] > > > > > > > > TopDocCollector's current implementation of collect() > > > > > > > > ---------------------------------------------------------------------------- > > > > public void collect(int doc, float score) { > > > > if (score > 0.0f) { > > > > totalHits++; > > > > if (hq.size() < numHits || score >= minScore) { > > > > hq.insert(new ScoreDoc(doc, score)); > > > > minScore = ((ScoreDoc)hq.top()).score; // maintain minScore > > > > } > > > > } > > > > } > > > > [See how it allocates a new ScoreDoc every time this method is > called] > > > > > > > > TopDocCollector's new implementation of collect() > > > > > > > > ----------------------------------------------------------------------------- > > > > public void collect(int doc, float score) { > > > > if (score == 0.0f) { > > > > return; > > > > } > > > > totalHits++; > > > > if (hq.size() < numHits || score >= minScore) { > > > > if (sd == null) { > > > > sd = new ScoreDoc(doc, score); > > > > } else { > > > > sd.doc = doc; > > > > sd.score = score; > > > > } > > > > sd = (ScoreDoc) hq.insertWithOverflow(sd); > > > > minScore = ((ScoreDoc)hq.top()).score; // maintain > minScore > > > > } > > > > } > > > > [sd is a class memeber of the collector, of type ScoreDoc] > > > > > > > > Now for the performance tests. I've done two tests: > > > > 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and > 100,000,000 > > > > times using both implementations. Here are the results: > > > > 1M 10M 100M > > > > Current Collector 218 ms 1,907ms 20,000ms > > > > Modified Collector 141 ms 1,297ms 12,672ms > > > > As you can see, the new implementation 30-40% faster. > > > > > > > > 2. Wrote some tasks in the benchmark package that makes use of the > new > > > PQ > > > > while executing a real search over an index with 1M documents, of > small > > > size > > > > (10 characters). Here are the results: > > > > Current TopDocCollector > > > > > > > > ----------------------------------------------------------------------------------------------------------------------------------------------- > > > > ------------> Report Sum By (any) Name (5 about 6 out of 6) > > > > Operation round runCnt recsPerRun rec/s elapsedSec > > > > avgUsedMem avgTotalMem > > > > CreateIndex 0 1 1 10.6 0.09 > > > > 2,219,112 4,389,376 > > > > AddDocs_1000000 - 0 - - 1 - - 1000000 - 52,075.2 - - 19.20- > > > > 34,497,176 - 52,689,408 > > > > CloseIndex 0 2 1 0.1 16.62 > > > > 26,178,972 51,378,688 > > > > OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - - 0.00- > > > > 48,771,304 - 50,067,968 > > > > --> MySearch 0 1 1 5.3 > 0.19 > > > > 48,771,304 50,067,968 > > > > > > > > > > > > Modified TopDocCollector > > > > > > > > ----------------------------------------------------------------------------------------------------------------------------------------------- > > > > ------------> Report Sum By (any) Name (5 about 6 out of 6) > > > > Operation round runCnt recsPerRun rec/s elapsedSec > > > > avgUsedMem avgTotalMem > > > > CreateIndex 0 1 1 10.6 0.09 > > > > 2,207,008 4,389,376 > > > > AddDocs_1000000 - 0 - - 1 - - 1000000 - 50,955.4 - - 19.62- > > > > 32,531,992 - 44,825,088 > > > > CloseIndex 0 2 1 0.1 16.84 > > > > 57,853,148 61,929,984 > > > > OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - - 0.00- > > > > 57,792,136 - 61,929,984 > > > > --> MySearch 0 1 1 7.1 > 0.14 > > > > 57,939,856 61,929,984 > > > > Notice the time difference in search (0.14 modified vs. 0.19current). > > > Here > > > > too, a 30% improvement. > > > > > > > > One thing that I wasn't able to show, but I think it's pretty much > clear > > > --> > > > > the new implementation causes a lot less object allocations. > Consider > > > the > > > > typical search for 10 results over an index with 1M documents. The > > > current > > > > implementation will allocate 1M (!) ScoreDoc instances, while the > new > > > one > > > > will allocate 11 (10 for the PQ and 1 for reusing). On heavily > loaded > > > > systems, this will result in far less work for the GC. > > > > > > > > I would like to suggest to reflect this new implementation in > > > PriorityQueue > > > > and also modify TopDocCollector to make use of this new method. > Several > > > ways > > > > to do it: > > > > 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to > > > accept > > > > better ones), make insert() deprecated (let's say in 2.3) and > release > > > after > > > > that (2.4?) rename insertWithOverflow to insert(). A complex > process, > > > but it > > > > won't break existing applications' code. > > > > 2. Change insert's signature and state that applications that move > to > > > > 2.3(for example), need to change their code as well (actually it > won't > > > > compile). > > > > 3. Any other alternatives? > > > > > > > > And .. of course, review the classes in the Lucene code that use PQ > and > > > > modify them as necessary. > > > > > > > > A very long email for a very simple (but important) performance > > > improvement. > > > > > > > > Shai > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera