Re: Performance Improvement for Search using PriorityQueue

Michael McCandless Mon, 10 Dec 2007 01:01:05 -0800

Have you done any tests on real queries to see what impact thisimprovement has in practice? Or, to measure how many ScoreDocs are"typically" allocated?


Mike

Shai Erera wrote:

I agree w.r.t the current implementation, however in the worse case(as wetend to consider when talking about computer algorithms), it willallocate aScoreDoc per result. With the overflow reuse, it will not allocatethose
objects, no matter what's the input it gets.
Also, notice that there is a performance hit with the currentimplementationof TopDocCollector: it first checks that the document should beinserted and
if so, PQ does the same check again.
So, if you change the current implementation to always attempt aninsert,
you'd gain some performance improvements as well.

On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]> wrote:
The current TopDocCollector only allocates a ScoreDoc when the given
score causes a new ScoreDoc to be added into the queue, but it does
not reuse anything that overflows out of the queue.
So, reusing the overflow out of the queue should reduce object
allocations. especially for indexes that tend to have better scoring
docs at the end. I wouldn't expect a 30% improvement out of that,
but it would help, if only to reduce occasional performance
deteriorations.

Regards,
Paul Elschot


On Monday 10 December 2007 08:11:50 Shai Erera wrote:
Hi
Lucene's PQ implements two methods: put (assumes the PQ has roomfor theobject) and insert (checks whether the object can be insertedetc.). The
implementation of insert() requires the application that uses it to
allocate
a new object every time it calls insert. Specifically, it cannotreuse
the
objects that were removed from the PQ.
I've done some measurements on the performance that search wouldgain by
using that method, through the change of TopDocCollector.

PriorityQueue change (added insertWithOverflow method)
-----------------------------------------------------------------------------------
    /**
* insertWithOverflow() is similar to insert(), except itsreturn
value:
it
* returns the object (if any) that was dropped off the heapbecause
it
was
* full. This can be the given parameter (in case it issmaller than
the
     * full heap's minimum, and couldn't be added) or another object
that
was
     * previously the smallest value in the heap and now has been
replaced
by a
     * larger one.
     */
    public Object insertWithOverflow(Object element) {
        if (size < maxSize) {
            put(element);
            return null;
        } else if (size > 0 && !lessThan(element, top())) {
            Object ret = heap[1];
            heap[1] = element;
            adjustTop();
            return ret;
        } else {
            return element;
        }
    }
[Very similar to insert(), only it returns the object that waskicked
out of
the Queue, or null]

TopDocCollector's current implementation of collect()
----------------------------------------------------------------------------
  public void collect(int doc, float score) {
    if (score > 0.0f) {
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
        hq.insert(new ScoreDoc(doc, score));
        minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
      }
    }
  }
[See how it allocates a new ScoreDoc every time this method iscalled]
TopDocCollector's new implementation of collect()
-----------------------------------------------------------------------------
  public void collect(int doc, float score) {
      if (score == 0.0f) {
          return;
      }
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
          if (sd == null) {
              sd = new ScoreDoc(doc, score);
          } else {
              sd.doc = doc;
              sd.score = score;
          }
          sd = (ScoreDoc) hq.insertWithOverflow(sd);
minScore = ((ScoreDoc)hq.top()).score; // maintainminScore
      }
  }
[sd is a class memeber of the collector, of type ScoreDoc]

Now for the performance tests. I've done two tests:
1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and100,000,000
times using both implementations. Here are the results:
                              1M         10M          100M
Current Collector      218 ms   1,907ms    20,000ms
Modified Collector    141 ms    1,297ms   12,672ms
As you can see, the new implementation 30-40% faster.
2. Wrote some tasks in the benchmark package that makes use ofthe new
PQ
while executing a real search over an index with 1M documents, ofsmall
size
(10 characters). Here are the results:
                          Current TopDocCollector
-----------------------------------------------------------------------------------------------------------------------------------------------
------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
avgUsedMem    avgTotalMem
CreateIndex         0        1            1         10.6        0.09
2,219,112      4,389,376
AddDocs_1000000 - 0 - - 1 - - 1000000 - 52,075.2 - -19.20 -
34,497,176 -   52,689,408
CloseIndex          0        2            1          0.1       16.62
26,178,972     51,378,688
OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - -0.00 -
48,771,304 -   50,067,968
--> MySearch 0 1 15.3 0.19
48,771,304     50,067,968


                          Modified TopDocCollector
-----------------------------------------------------------------------------------------------------------------------------------------------
------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
avgUsedMem    avgTotalMem
CreateIndex         0        1            1         10.6        0.09
2,207,008      4,389,376
AddDocs_1000000 - 0 - - 1 - - 1000000 - 50,955.4 - -19.62 -
32,531,992 -   44,825,088
CloseIndex          0        2            1          0.1       16.84
57,853,148     61,929,984
OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - -0.00 -
57,792,136 -   61,929,984
--> MySearch 0 1 17.1 0.14
57,939,856     61,929,984
Notice the time difference in search (0.14 modified vs. 0.19current).
Here
too, a 30% improvement.
One thing that I wasn't able to show, but I think it's prettymuch clear
-->
the new implementation causes a lot less object allocations.Consider
the
typical search for 10 results over an index with 1M documents. The
current
implementation will allocate 1M (!) ScoreDoc instances, while thenew
one
will allocate 11 (10 for the PQ and 1 for reusing). On heavilyloaded
systems, this will result in far less work for the GC.

I would like to suggest to reflect this new implementation in
PriorityQueue
and also modify TopDocCollector to make use of this new method.Several
ways
to do it:
1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
accept
better ones), make insert() deprecated (let's say in 2.3) andrelease
after
that (2.4?) rename insertWithOverflow to insert(). A complexprocess,
but it
won't break existing applications' code.
2. Change insert's signature and state that applications thatmove to2.3(for example), need to change their code as well (actually itwon't
compile).
3. Any other alternatives?
And .. of course, review the classes in the Lucene code that usePQ and
modify them as necessary.

A very long email for a very simple (but important) performance
improvement.
Shai
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,

Shai Erera



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance Improvement for Search using PriorityQueue

Reply via email to