Hi Lucene's PQ implements two methods: put (assumes the PQ has room for the object) and insert (checks whether the object can be inserted etc.). The implementation of insert() requires the application that uses it to allocate a new object every time it calls insert. Specifically, it cannot reuse the objects that were removed from the PQ. I've done some measurements on the performance that search would gain by using that method, through the change of TopDocCollector.
PriorityQueue change (added insertWithOverflow method) ----------------------------------------------------------------------------------- /** * insertWithOverflow() is similar to insert(), except its return value: it * returns the object (if any) that was dropped off the heap because it was * full. This can be the given parameter (in case it is smaller than the * full heap's minimum, and couldn't be added) or another object that was * previously the smallest value in the heap and now has been replaced by a * larger one. */ public Object insertWithOverflow(Object element) { if (size < maxSize) { put(element); return null; } else if (size > 0 && !lessThan(element, top())) { Object ret = heap[1]; heap[1] = element; adjustTop(); return ret; } else { return element; } } [Very similar to insert(), only it returns the object that was kicked out of the Queue, or null] TopDocCollector's current implementation of collect() ---------------------------------------------------------------------------- public void collect(int doc, float score) { if (score > 0.0f) { totalHits++; if (hq.size() < numHits || score >= minScore) { hq.insert(new ScoreDoc(doc, score)); minScore = ((ScoreDoc)hq.top()).score; // maintain minScore } } } [See how it allocates a new ScoreDoc every time this method is called] TopDocCollector's new implementation of collect() ----------------------------------------------------------------------------- public void collect(int doc, float score) { if (score == 0.0f) { return; } totalHits++; if (hq.size() < numHits || score >= minScore) { if (sd == null) { sd = new ScoreDoc(doc, score); } else { sd.doc = doc; sd.score = score; } sd = (ScoreDoc) hq.insertWithOverflow(sd); minScore = ((ScoreDoc)hq.top()).score; // maintain minScore } } [sd is a class memeber of the collector, of type ScoreDoc] Now for the performance tests. I've done two tests: 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and 100,000,000 times using both implementations. Here are the results: 1M 10M 100M Current Collector 218 ms 1,907ms 20,000ms Modified Collector 141 ms 1,297ms 12,672ms As you can see, the new implementation 30-40% faster. 2. Wrote some tasks in the benchmark package that makes use of the new PQ while executing a real search over an index with 1M documents, of small size (10 characters). Here are the results: Current TopDocCollector ----------------------------------------------------------------------------------------------------------------------------------------------- ------------> Report Sum By (any) Name (5 about 6 out of 6) Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem CreateIndex 0 1 1 10.6 0.09 2,219,112 4,389,376 AddDocs_1000000 - 0 - - 1 - - 1000000 - 52,075.2 - - 19.20 - 34,497,176 - 52,689,408 CloseIndex 0 2 1 0.1 16.62 26,178,972 51,378,688 OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - - 0.00 - 48,771,304 - 50,067,968 --> MySearch 0 1 1 5.3 0.19 48,771,304 50,067,968 Modified TopDocCollector ----------------------------------------------------------------------------------------------------------------------------------------------- ------------> Report Sum By (any) Name (5 about 6 out of 6) Operation round runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem CreateIndex 0 1 1 10.6 0.09 2,207,008 4,389,376 AddDocs_1000000 - 0 - - 1 - - 1000000 - 50,955.4 - - 19.62 - 32,531,992 - 44,825,088 CloseIndex 0 2 1 0.1 16.84 57,853,148 61,929,984 OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - - 0.00 - 57,792,136 - 61,929,984 --> MySearch 0 1 1 7.1 0.14 57,939,856 61,929,984 Notice the time difference in search (0.14 modified vs. 0.19 current). Here too, a 30% improvement. One thing that I wasn't able to show, but I think it's pretty much clear --> the new implementation causes a lot less object allocations. Consider the typical search for 10 results over an index with 1M documents. The current implementation will allocate 1M (!) ScoreDoc instances, while the new one will allocate 11 (10 for the PQ and 1 for reusing). On heavily loaded systems, this will result in far less work for the GC. I would like to suggest to reflect this new implementation in PriorityQueue and also modify TopDocCollector to make use of this new method. Several ways to do it: 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to accept better ones), make insert() deprecated (let's say in 2.3) and release after that (2.4?) rename insertWithOverflow to insert(). A complex process, but it won't break existing applications' code. 2. Change insert's signature and state that applications that move to 2.3(for example), need to change their code as well (actually it won't compile). 3. Any other alternatives? And .. of course, review the classes in the Lucene code that use PQ and modify them as necessary. A very long email for a very simple (but important) performance improvement. Shai