Re: Performance Improvement for Search using PriorityQueue

robert engels Mon, 10 Dec 2007 12:10:31 -0800

As has been pointed out on many threads, a modern JVM really doesn'tneed you to recycle objects, especially for small short lived objects.

It is actually less efficient in many cases (since the variables needto be reinitialized).

Using object pools (except when pooling external resources) is awaste of time.


On Dec 10, 2007, at 1:31 PM, Shai Erera wrote:

Hi

Well, I have results from a 1M index for now (the index contains 100K

documents duplicated 10 times, so it's not the final test I'll run,but itstill shows something). I ran 2000 short queries (2.4 keywords onaverage)on a 1M docs index, after 50 queries warm-up. Following are theresults:


Current TopDocCollector:
------------------------------------
num queries: 2000
numDocs=1000000
total time: 15910 ms
avg time: 7.955 ms
avg. allocations: 79.7445
total allocation time: 0
avg. num results: 54841.26

Modified TopDocCollector:
-------------------------------------
num queries: 2000
numDocs=1000000
total time: 15909 ms
avg time: 7.9545 ms
avg. allocations: 9.8565
total allocation time: 0
avg. num results: 54841.26

As you can see, the actual allocation time is really negligible andthereisn't much difference in the avg. running times of the queries.However, the*current* runs performed a lot worse at the beginning, before theOS cache

warmed up.

The only significant difference is the number of allocations - themodifiedTDC and PQ allocate ~90% (!) less objects. This is significant,especially

in heavy loaded systems.

Tomorrow I will run the same test on a 10M (unique) documentsindex, but I

think the picture is getting clear ...


On Dec 10, 2007 6:15 PM, Paul Elschot <[EMAIL PROTECTED]> wrote:

On Monday 10 December 2007 09:19:43 Shai Erera wrote:

I agree w.r.t the current implementation, however in the worsecase (as

we

tend to consider when talking about computer algorithms), it will

allocate a

ScoreDoc per result. With the overflow reuse, it will notallocate those
objects, no matter what's the input it gets.
Also, notice that there is a performance hit with the current

implementation

of TopDocCollector: it first checks that the document should beinserted

and

if so, PQ does the same check again.
So, if you change the current implementation to always attempt an

insert,

you'd gain some performance improvements as well.


One could consider a specialized priority queue, somewhat like
ScorerDocQueue, with an insert operation using the arguments
as in the collect() call instead a ScoreDoc object.
I don't know where HitQueue is currently used, but that could be
the place to inline PriorityQueue instead of inheriting from it.
Otherwise TopDocCollector could be used for inlining PriorityQueue.

Although practical tests are always good, an extra performance test
with the worst case of slowly increasing score values should be quite

helpful for comparing with the current implementation and to getthe last

bits of performance out of this.

Regards,
Paul Elschot

On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]>wrote:
The current TopDocCollector only allocates a ScoreDoc when thegiven
score causes a new ScoreDoc to be added into the queue, but it does
not reuse anything that overflows out of the queue.
So, reusing the overflow out of the queue should reduce object
allocations. especially for indexes that tend to have betterscoring
docs at the end. I wouldn't expect a 30% improvement out of that,
but it would help, if only to reduce occasional performance
deteriorations.

Regards,
Paul Elschot


On Monday 10 December 2007 08:11:50 Shai Erera wrote:
Hi
Lucene's PQ implements two methods: put (assumes the PQ hasroom for

the

object) and insert (checks whether the object can be insertedetc.).

The

implementation of insert() requires the application that usesit to
allocate
a new object every time it calls insert. Specifically, it cannot

reuse

the

objects that were removed from the PQ.
I've done some measurements on the performance that search would

gain by

using that method, through the change of TopDocCollector.

PriorityQueue change (added insertWithOverflow method)

-----------------------------------------------------------------------------------

    /**
     * insertWithOverflow() is similar to insert(), except its

return

value:

it
     * returns the object (if any) that was dropped off the heap

because

it
was
* full. This can be the given parameter (in case it issmaller

than

the

* full heap's minimum, and couldn't be added) or anotherobject

that

was
     * previously the smallest value in the heap and now has been

replaced

by a
     * larger one.
     */
    public Object insertWithOverflow(Object element) {
        if (size < maxSize) {
            put(element);
            return null;
        } else if (size > 0 && !lessThan(element, top())) {
            Object ret = heap[1];
            heap[1] = element;
            adjustTop();
            return ret;
        } else {
            return element;
        }
    }
[Very similar to insert(), only it returns the object that was

kicked

out of

the Queue, or null]

TopDocCollector's current implementation of collect()

----------------------------------------------------------------------------

  public void collect(int doc, float score) {
    if (score > 0.0f) {
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
        hq.insert(new ScoreDoc(doc, score));

minScore = ((ScoreDoc)hq.top()).score; // maintainminScore

      }
    }
  }
[See how it allocates a new ScoreDoc every time this method is

called]


TopDocCollector's new implementation of collect()

-----------------------------------------------------------------------------

  public void collect(int doc, float score) {
      if (score == 0.0f) {
          return;
      }
      totalHits++;
      if (hq.size() < numHits || score >= minScore) {
          if (sd == null) {
              sd = new ScoreDoc(doc, score);
          } else {
              sd.doc = doc;
              sd.score = score;
          }
          sd = (ScoreDoc) hq.insertWithOverflow(sd);
          minScore = ((ScoreDoc)hq.top()).score; // maintain

minScore

      }
  }
[sd is a class memeber of the collector, of type ScoreDoc]

Now for the performance tests. I've done two tests:
1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and

100,000,000

times using both implementations. Here are the results:
                              1M         10M          100M
Current Collector      218 ms   1,907ms    20,000ms
Modified Collector    141 ms    1,297ms   12,672ms
As you can see, the new implementation 30-40% faster.

2. Wrote some tasks in the benchmark package that makes use of the

new

PQ

while executing a real search over an index with 1M documents, of

small

size

(10 characters). Here are the results:
                          Current TopDocCollector

-----------------------------------------------------------------------------------------------------------------------------------------------

------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation round runCnt recsPerRun rec/selapsedSec
avgUsedMem    avgTotalMem
CreateIndex 0 1 1 10.60.09
2,219,112      4,389,376
AddDocs_1000000 - 0 - - 1 - - 1000000 - 52,075.2 - -19.20-
34,497,176 -   52,689,408
CloseIndex 0 2 1 0.116.62
26,178,972     51,378,688
OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - -0.00-
48,771,304 -   50,067,968
--> MySearch            0        1            1          5.3

0.19

48,771,304     50,067,968


                          Modified TopDocCollector

-----------------------------------------------------------------------------------------------------------------------------------------------

------------> Report Sum By (any) Name (5 about 6 out of 6)
Operation round runCnt recsPerRun rec/selapsedSec
avgUsedMem    avgTotalMem
CreateIndex 0 1 1 10.60.09
2,207,008      4,389,376
AddDocs_1000000 - 0 - - 1 - - 1000000 - 50,955.4 - -19.62-
32,531,992 -   44,825,088
CloseIndex 0 2 1 0.116.84
57,853,148     61,929,984
OpenIndex - - - 0 - - 1 - - - - 1 - - 1,000.0 - -0.00-
57,792,136 -   61,929,984
--> MySearch            0        1            1          7.1

0.14

57,939,856     61,929,984
Notice the time difference in search (0.14 modified vs.0.19current).
Here
too, a 30% improvement.

One thing that I wasn't able to show, but I think it's pretty much

clear

-->

the new implementation causes a lot less object allocations.

Consider

the

typical search for 10 results over an index with 1M documents. The

current

implementation will allocate 1M (!) ScoreDoc instances, while the

new

one

will allocate 11 (10 for the PQ and 1 for reusing). On heavily

loaded

systems, this will result in far less work for the GC.

I would like to suggest to reflect this new implementation in

PriorityQueue

and also modify TopDocCollector to make use of this new method.

Several

ways
to do it:
1. Add insertWithOverflow to PQ (I hate the name and I'mwilling to
accept
better ones), make insert() deprecated (let's say in 2.3) and

release

after

that (2.4?) rename insertWithOverflow to insert(). A complex

process,

but it

won't break existing applications' code.
2. Change insert's signature and state that applications that move

to

2.3(for example), need to change their code as well (actually it

won't

compile).
3. Any other alternatives?
And .. of course, review the classes in the Lucene code thatuse PQ

and

modify them as necessary.

A very long email for a very simple (but important) performance

improvement.


Shai

---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Regards,

Shai Erera



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance Improvement for Search using PriorityQueue

Reply via email to