No - I didn't try to populate an index with real data and run real queries
(what is "real" after all?). I know from my experience of indexes with
several millions of documents where there are queries with several hundred
thousands results (one query even hit 2.5 M documents). This is typical in
search: users type on average 2.3 terms in a query. The chances you'd hit a
query with huge result set are not that small in such cases (I'm not saying
this is the most common case though, I agree that most of the searches don't
process that many documents).
However, this change will improve performance from the algorithm point of
view - you allocate as many as numRequestedHits+1 no matter how many
documents your query processes.

On Dec 10, 2007 10:59 AM, Michael McCandless <[EMAIL PROTECTED]>
wrote:

>
> Have you done any tests on real queries to see what impact this
> improvement has in practice?  Or, to measure how many ScoreDocs are
> "typically" allocated?
>
> Mike
>
> Shai Erera wrote:
>
> > I agree w.r.t the current implementation, however in the worse case
> > (as we
> > tend to consider when talking about computer algorithms), it will
> > allocate a
> > ScoreDoc per result. With the overflow reuse, it will not allocate
> > those
> > objects, no matter what's the input it gets.
> > Also, notice that there is a performance hit with the current
> > implementation
> > of TopDocCollector: it first checks that the document should be
> > inserted and
> > if so, PQ does the same check again.
> > So, if you change the current implementation to always attempt an
> > insert,
> > you'd gain some performance improvements as well.
> >
> > On Dec 10, 2007 10:15 AM, Paul Elschot <[EMAIL PROTECTED]> wrote:
> >
> >> The current TopDocCollector only allocates a ScoreDoc when the given
> >> score causes a new ScoreDoc to be added into the queue, but it does
> >> not reuse anything that overflows out of the queue.
> >> So, reusing the overflow out of the queue should reduce object
> >> allocations. especially for indexes that tend to have better scoring
> >> docs at the end. I wouldn't expect a 30% improvement out of that,
> >> but it would help, if only to reduce occasional performance
> >> deteriorations.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >>
> >> On Monday 10 December 2007 08:11:50 Shai Erera wrote:
> >>> Hi
> >>>
> >>> Lucene's PQ implements two methods: put (assumes the PQ has room
> >>> for the
> >>> object) and insert (checks whether the object can be inserted
> >>> etc.). The
> >>> implementation of insert() requires the application that uses it to
> >> allocate
> >>> a new object every time it calls insert. Specifically, it cannot
> >>> reuse
> >> the
> >>> objects that were removed from the PQ.
> >>> I've done some measurements on the performance that search would
> >>> gain by
> >>> using that method, through the change of TopDocCollector.
> >>>
> >>> PriorityQueue change (added insertWithOverflow method)
> >>>
> >> ---------------------------------------------------------------------
> >> --------------
> >>>     /**
> >>>      * insertWithOverflow() is similar to insert(), except its
> >>> return
> >> value:
> >>> it
> >>>      * returns the object (if any) that was dropped off the heap
> >>> because
> >> it
> >>> was
> >>>      * full. This can be the given parameter (in case it is
> >>> smaller than
> >> the
> >>>      * full heap's minimum, and couldn't be added) or another object
> >> that
> >>> was
> >>>      * previously the smallest value in the heap and now has been
> >> replaced
> >>> by a
> >>>      * larger one.
> >>>      */
> >>>     public Object insertWithOverflow(Object element) {
> >>>         if (size < maxSize) {
> >>>             put(element);
> >>>             return null;
> >>>         } else if (size > 0 && !lessThan(element, top())) {
> >>>             Object ret = heap[1];
> >>>             heap[1] = element;
> >>>             adjustTop();
> >>>             return ret;
> >>>         } else {
> >>>             return element;
> >>>         }
> >>>     }
> >>> [Very similar to insert(), only it returns the object that was
> >>> kicked
> >> out of
> >>> the Queue, or null]
> >>>
> >>> TopDocCollector's current implementation of collect()
> >>>
> >> ---------------------------------------------------------------------
> >> -------
> >>>   public void collect(int doc, float score) {
> >>>     if (score > 0.0f) {
> >>>       totalHits++;
> >>>       if (hq.size() < numHits || score >= minScore) {
> >>>         hq.insert(new ScoreDoc(doc, score));
> >>>         minScore = ((ScoreDoc)hq.top()).score; // maintain minScore
> >>>       }
> >>>     }
> >>>   }
> >>> [See how it allocates a new ScoreDoc every time this method is
> >>> called]
> >>>
> >>> TopDocCollector's new implementation of collect()
> >>>
> >> ---------------------------------------------------------------------
> >> --------
> >>>   public void collect(int doc, float score) {
> >>>       if (score == 0.0f) {
> >>>           return;
> >>>       }
> >>>       totalHits++;
> >>>       if (hq.size() < numHits || score >= minScore) {
> >>>           if (sd == null) {
> >>>               sd = new ScoreDoc(doc, score);
> >>>           } else {
> >>>               sd.doc = doc;
> >>>               sd.score = score;
> >>>           }
> >>>           sd = (ScoreDoc) hq.insertWithOverflow(sd);
> >>>           minScore = ((ScoreDoc)hq.top()).score; // maintain
> >>> minScore
> >>>       }
> >>>   }
> >>> [sd is a class memeber of the collector, of type ScoreDoc]
> >>>
> >>> Now for the performance tests. I've done two tests:
> >>> 1. Calling TopDocCollector.collect 1,000,000, 10,000,000 and
> >>> 100,000,000
> >>> times using both implementations. Here are the results:
> >>>                               1M         10M          100M
> >>> Current Collector      218 ms   1,907ms    20,000ms
> >>> Modified Collector    141 ms    1,297ms   12,672ms
> >>> As you can see, the new implementation 30-40% faster.
> >>>
> >>> 2. Wrote some tasks in the benchmark package that makes use of
> >>> the new
> >> PQ
> >>> while executing a real search over an index with 1M documents, of
> >>> small
> >> size
> >>> (10 characters). Here are the results:
> >>>                           Current TopDocCollector
> >>>
> >> ---------------------------------------------------------------------
> >> ---------------------------------------------------------------------
> >> -----
> >>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> >>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> >>> avgUsedMem    avgTotalMem
> >>> CreateIndex         0        1            1         10.6        0.09
> >>> 2,219,112      4,389,376
> >>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   52,075.2 -  -
> >>> 19.20 -
> >>> 34,497,176 -   52,689,408
> >>> CloseIndex          0        2            1          0.1       16.62
> >>> 26,178,972     51,378,688
> >>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -
> >>> 0.00 -
> >>> 48,771,304 -   50,067,968
> >>> --> MySearch            0        1            1
> >>> 5.3        0.19
> >>> 48,771,304     50,067,968
> >>>
> >>>
> >>>                           Modified TopDocCollector
> >>>
> >> ---------------------------------------------------------------------
> >> ---------------------------------------------------------------------
> >> -----
> >>> ------------> Report Sum By (any) Name (5 about 6 out of 6)
> >>> Operation       round   runCnt   recsPerRun        rec/s  elapsedSec
> >>> avgUsedMem    avgTotalMem
> >>> CreateIndex         0        1            1         10.6        0.09
> >>> 2,207,008      4,389,376
> >>> AddDocs_1000000 -   0 -  -   1 -  - 1000000 -   50,955.4 -  -
> >>> 19.62 -
> >>> 32,531,992 -   44,825,088
> >>> CloseIndex          0        2            1          0.1       16.84
> >>> 57,853,148     61,929,984
> >>> OpenIndex -  -  -   0 -  -   1 -  -  -  - 1 -  - 1,000.0 -  -
> >>> 0.00 -
> >>> 57,792,136 -   61,929,984
> >>> --> MySearch            0        1            1
> >>> 7.1        0.14
> >>> 57,939,856     61,929,984
> >>> Notice the time difference in search (0.14 modified vs. 0.19
> >>> current).
> >> Here
> >>> too, a 30% improvement.
> >>>
> >>> One thing that I wasn't able to show, but I think it's pretty
> >>> much clear
> >> -->
> >>> the new implementation causes a lot less object allocations.
> >>> Consider
> >> the
> >>> typical search for 10 results over an index with 1M documents. The
> >> current
> >>> implementation will allocate 1M (!) ScoreDoc instances, while the
> >>> new
> >> one
> >>> will allocate 11 (10 for the PQ and 1 for reusing). On heavily
> >>> loaded
> >>> systems, this will result in far less work for the GC.
> >>>
> >>> I would like to suggest to reflect this new implementation in
> >> PriorityQueue
> >>> and also modify TopDocCollector to make use of this new method.
> >>> Several
> >> ways
> >>> to do it:
> >>> 1. Add insertWithOverflow to PQ (I hate the name and I'm willing to
> >> accept
> >>> better ones), make insert() deprecated (let's say in 2.3) and
> >>> release
> >> after
> >>> that (2.4?) rename insertWithOverflow to insert(). A complex
> >>> process,
> >> but it
> >>> won't break existing applications' code.
> >>> 2. Change insert's signature and state that applications that
> >>> move to
> >>> 2.3(for example), need to change their code as well (actually it
> >>> won't
> >>> compile).
> >>> 3. Any other alternatives?
> >>>
> >>> And .. of course, review the classes in the Lucene code that use
> >>> PQ and
> >>> modify them as necessary.
> >>>
> >>> A very long email for a very simple (but important) performance
> >> improvement.
> >>>
> >>> Shai
> >>>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Reply via email to