Re: Performance Improvement for Search using PriorityQueue

Shai Erera Mon, 10 Dec 2007 11:31:53 -0800

Hi

Well, I have results from a 1M index for now (the index contains 100K
documents duplicated 10 times, so it's not the final test I'll run, but it
still shows something). I ran 2000 short queries (2.4 keywords on average)
on a 1M docs index, after 50 queries warm-up. Following are the results:


Current TopDocCollector:
------------------------------------
num queries: 2000
numDocs=1000000
total time: 15910 ms
avg time: 7.955 ms
avg. allocations: 79.7445
total allocation time: 0
avg. num results: 54841.26

Modified TopDocCollector:
-------------------------------------
num queries: 2000
numDocs=1000000
total time: 15909 ms
avg time: 7.9545 ms
avg. allocations: 9.8565
total allocation time: 0
avg. num results: 54841.26

As you can see, the actual allocation time is really negligible and there
isn't much difference in the avg. running times of the queries. However, the
*current* runs performed a lot worse at the beginning, before the OS cache
warmed up.
The only significant difference is the number of allocations - the modified
TDC and PQ allocate ~90% (!) less objects. This is significant, especially
in heavy loaded systems.

Tomorrow I will run the same test on a 10M (unique) documents index, but I
think the picture is getting clear ...


On Dec 10, 2007 6:15 PM, Paul Elschot <[EMAIL PROTECTED]> wrote:

> On Monday 10 December 2007 09:19:43 Shai Erera wrote:
> > I agree w.r.t the current implementation, however in the worse case (as
> we
> > tend to consider when talking about computer algorithms), it will
> allocate a
> > ScoreDoc per result. With the overflow reuse, it will not allocate those
> > objects, no matter what's the input it gets.
> > Also, notice that there is a performance hit with the current
> implementation
> > of TopDocCollector: it first checks that the document should be inserted
> and
> > if so, PQ does the same check again.
> > So, if you change the current implementation to always attempt an
> insert,
> > you'd gain some performance improvements as well.
>
> One could consider a specialized priority queue, somewhat like
> ScorerDocQueue, with an insert operation using the arguments
> as in the collect() call instead a ScoreDoc object.
> I don't know where HitQueue is currently used, but that could be
> the place to inline PriorityQueue instead of inheriting from it.
> Otherwise TopDocCollector could be used for inlining PriorityQueue.
>
> Although practical tests are always good, an extra performance test
> with the worst case of slowly increasing score values should be quite
> helpful for comparing with the current implementation and to get the last
> bits of performance out of this.
>
> Regards,
> Paul Elschot
>
>
> >
> > On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]> wrote:
> >
> > > The current TopDocCollector only allocates a ScoreDoc when the given
> > > score causes a new ScoreDoc to be added into the queue, but it does
> > > not reuse anything that overflows out of the queue.
> > > So, reusing the overflow out of the queue should reduce object
> > > allocations. especially for indexes that tend to have better scoring
> > > docs at the end. I wouldn't expect a 30% improvement out of that,
> > > but it would help, if only to reduce occasional performance
> > > deteriorations.
> > >
> > > Regards,
> > > Paul Elschot
> > >
> > >
> > > On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> > > > Hi
> > > >
> > > > Lucene's PQ implements two methods: put (assumes the PQ has room for
> the
> > > > object) and insert (checks whether the object can be inserted etc.).
> The
> > > > implementation of insert() requires the application that uses it to
> > > allocate
> > > > a new object every time it calls insert. Specifically, it cannot
> reuse
> > > the
> > > > objects that were removed from the PQ.
> > > > I've done some measurements on the performance that search would
> gain by
> > > > using that method, through the change of TopDocCollector.
> > > >
> > > > PriorityQueue change (added insertWithOverflow method)
> > > >
> > >
> -----------------------------------------------------------------------------------
> > > >     /**
> > > >      * insertWithOverflow() is similar to insert(), except its
> return
> > > value:
> > > > it
> > > >      * returns the object (if any) that was dropped off the heap
> because
> > > it
> > > > was
> > > >      * full. This can be the given parameter (in case it is smaller
> than
> > > the
> > > >      * full heap's minimum, and couldn't be added) or another object
> > > that
> > > > was
> > > >      * previously the smallest value in the heap and now has been
> > > replaced
> > > > by a
> > > >      * larger one.
> > > >      */
> > > >     public Object insertWithOverflow(Object element) {
> > > >         if (size < maxSize) {
> > > >             put(element);
> > > >             return null;
> > > >         } else if (size > 0 && !lessThan(element, top())) {
> > > >             Object ret = heap[1];
> > > >             heap[1] = element;
> > > >             adjustTop();
> > > >             return ret;
> > > >         } else {
> > > >             return element;
> > > >         }
> > > >     }
> > > > [Very similar to insert(), only it returns the object that was
> kicked
> > > out of
> > > > the Queue, or null]
> > > >
> > > > TopDocCollector's current implementation of collect()
> > > >
> > >
> ----------------------------------------------------------------------------
> > > >   public void collect(int doc, float score) {
> > > >     if (score > 0.0f) {
> > > >       totalHits++;
> > > >       if (hq.size() < numHits || score >= minScore) {
> > > >         hq.insert(new ScoreDoc(doc, score));
> > > >         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> > > >       }
> > > >     }
> > > >   }
> > > > [See how it allocates a new ScoreDoc every time this method is
> called]
> > > >
> > > > TopDocCollector's new implementation of collect()
> > > >
> > >
> -----------------------------------------------------------------------------
> > > >   public void collect(int doc, float score) {
> > > >       if (score == 0.0f) {
> > > >           return;
> > > >       }
> > > >       totalHits++;
> > > >       if (hq.size() < numHits || score >= minScore) {
> > > >           if (sd == null) {
> > > >               sd = new ScoreDoc(doc, score);
> > > >           } else {
> > > >               sd.doc = doc;
> > > >               sd.score = score;
> > > >           }
> > > >           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> > > >           minScore = ((ScoreDoc)hq.top()).score; // maintain
> minScore
> > > >       }
> > > >   }
> > > > [sd is a class memeber of the collector, of type ScoreDoc]
> > > >
> > > > Now for the performance tests. I've done two tests:
> > > > 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and
> 100,000,000
> > > > times using both implementations. Here are the results:
> > > >                               1M         10M          100M
> > > > Current Collector      218 ms   1,907ms    20,000ms
> > > > Modified Collector    141 ms    1,297ms   12,672ms
> > > > As you can see, the new implementation 30-40% faster.
> > > >
> > > > 2. Wrote some tasks in the benchmark package that makes use of the
> new
> > > PQ
> > > > while executing a real search over an index with 1M documents, of
> small
> > > size
> > > > (10 characters). Here are the results:
> > > >                           Current TopDocCollector
> > > >
> > >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > > avgUsedMem    avgTotalMem
> > > > CreateIndex         0        1            1         10.6        0.09
> > > > 2,219,112      4,389,376
> > > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -  19.20-
> > > > 34,497,176 -   52,689,408
> > > > CloseIndex          0        2            1          0.1       16.62
> > > > 26,178,972     51,378,688
> > > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00-
> > > > 48,771,304 -   50,067,968
> > > > --> MySearch            0        1            1          5.3
> 0.19
> > > > 48,771,304     50,067,968
> > > >
> > > >
> > > >                           Modified TopDocCollector
> > > >
> > >
> -----------------------------------------------------------------------------------------------------------------------------------------------
> > > > ------------> Report Sum By (any) Name (5 about 6 out of 6)
> > > > Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> > > > avgUsedMem    avgTotalMem
> > > > CreateIndex         0        1            1         10.6        0.09
> > > > 2,207,008      4,389,376
> > > > AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -  19.62-
> > > > 32,531,992 -   44,825,088
> > > > CloseIndex          0        2            1          0.1       16.84
> > > > 57,853,148     61,929,984
> > > > OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -   0.00-
> > > > 57,792,136 -   61,929,984
> > > > --> MySearch            0        1            1          7.1
> 0.14
> > > > 57,939,856     61,929,984
> > > > Notice the time difference in search (0.14 modified vs. 0.19current).
> > > Here
> > > > too, a 30% improvement.
> > > >
> > > > One thing that I wasn't able to show, but I think it's pretty much
> clear
> > > -->
> > > > the new implementation causes a lot less object allocations.
> Consider
> > > the
> > > > typical search for 10 results over an index with 1M documents. The
> > > current
> > > > implementation will allocate 1M (!) ScoreDoc instances, while the
> new
> > > one
> > > > will allocate 11 (10 for the PQ and 1 for reusing). On heavily
> loaded
> > > > systems, this will result in far less work for the GC.
> > > >
> > > > I would like to suggest to reflect this new implementation in
> > > PriorityQueue
> > > > and also modify TopDocCollector to make use of this new method.
> Several
> > > ways
> > > > to do it:
> > > > 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> > > accept
> > > > better ones), make insert() deprecated (let's say in 2.3) and
> release
> > > after
> > > > that (2.4?) rename insertWithOverflow to insert(). A complex
> process,
> > > but it
> > > > won't break existing applications' code.
> > > > 2. Change insert's signature and state that applications that move
> to
> > > > 2.3(for example), need to change their code as well (actually it
> won't
> > > > compile).
> > > > 3. Any other alternatives?
> > > >
> > > > And .. of course, review the classes in the Lucene code that use PQ
> and
> > > > modify them as necessary.
> > > >
> > > > A very long email for a very simple (but important) performance
> > > improvement.
> > > >
> > > > Shai
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Reply via email to