See my similar request from last March:
http://www.nabble.com/FieldSortedHitQueue-enhancement-to9733550.html#a9733550

Peter

On Dec 11, 2007 11:54 AM, Nadav Har'El <[EMAIL PROTECTED]> wrote:

> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance Improvement for
> Search using PriorityQueue":
> > Hi
> >
> > Lucene's PQ implements two methods: put (assumes the PQ has room for the
> > object) and insert (checks whether the object can be inserted etc.). The
> > implementation of insert() requires the application that uses it to
> allocate
> > a new object every time it calls insert. Specifically, it cannot reuse
> the
> > objects that were removed from the PQ.
>
> I've read this entire thread, and would like to add my comments about
> three
> independent issues, which I think that can and perhaps should be
> considered
> separately:
>
> 1. When Shai wanted to add the insertWithOverflow() method to
> PriorityQueue
>   he couldn't just subclass PriorityQueue in his application, but rather
>   was forced to modify PriorityQueue itself. Why? just because one field
>   of that classi - "heap" - was defined "private" instead of "protected".
>
>   Is there a special reason for that? If not, can we make the trivial
> change
>   to make PriorityQueue's fields protected, to allow Shai and others (see
> the
>   next point) to add functionality on top of PriorityQueue?
>
> 2. PriorityQueue, in addition to being used in about a dozen places inside
>   Lucene, is also a public class that advanced users often find useful
> when
>   implementing features like new collectors, new queries, and so on.
>   Unfortunately, my experience exactly matches Shai's: In the two
> occasions
>   where I used a PriorityQueue, I found that I needed such a
>   insertWithOverflow() method. If this feature is so useful (I can't
> believe
>   that Shai and me are the only ones who found it useful), I think it
> would
>   be nice to add it to Lucene's PriorityQueue, even if it isn't (yet) used
>   inside Lucene.
>
>   Just to make it sound more interesting, let me give you an example where
>   I needed (and implemented) an "insertWithOverflow()": I was implementing
> a
>   faceted search capability over Lucene. It calculated a count for each
>   facet value, and then I used a PriorityQueue to find the 10 best values.
>   The problem is that I also needed an "other" aggregator, which was
> supposed
>   to aggregate (in various ways) all the facets except the 10 best ones.
> For
>   that, I needed to know which facets dropped off the priorityqueue.
>
> 3. Finally, Shai asked for this new PriorityQueue.insertWithOverflow()
>   to be used in TopDocCollector. I have to admit I don't know how much
>   of a benefit this will be in the "typical" case. But I do know that
>   there's no such thing as a "typical" case...
>   I can easily think of a quite typical "worst case" though: Consider a
>   collection indexed in order of document age (a pretty typical scenario
>   for a long-running index), and then you do a sorting query
>   (TopFieldDocCollector), asking it to bring the 10 newest documents.
>   In that case, each and every document will have a new DocScore created -
>   which is the worst-case that Shai feared.
>   It would be nice if Shai or someone else could provide a measurement in
>   that case.
>
> P.S. When looking now at PriorityQueue's code, I found two tiny
> performance improvements that could be easily made to it - I wonder if
> there's any reason not to do them:
>
>  1. Insert can use heap[1] directly instead of calling top(). After all,
>    this is done in an if() that already ensures that size>0.
>
>  2. Regardless, top() could return heap[1] always, without any if(). After
>    all, the heap array is initialized to all nulls, so when size==0,
> heap[1]
>    is null anyway.
>
>
>
> > PriorityQueue change (added insertWithOverflow method)
> >
> -----------------------------------------------------------------------------------
> >     /**
> >      * insertWithOverflow() is similar to insert(), except its return
> value:
> > it
> >      * returns the object (if any) that was dropped off the heap because
> it
> > was
> >      * full. This can be the given parameter (in case it is smaller than
> the
> >      * full heap's minimum, and couldn't be added) or another object
> that
> > was
> >      * previously the smallest value in the heap and now has been
> replaced
> > by a
> >      * larger one.
> >      */
> >     public Object insertWithOverflow(Object element) {
> >         if (size < maxSize) {
> >             put(element);
> >             return null;
> >         } else if (size > 0 && !lessThan(element, top())) {
> >             Object ret = heap[1];
> >             heap[1] = element;
> >             adjustTop();
> >             return ret;
> >         } else {
> >             return element;
> >         }
> >     }
> > [Very similar to insert(), only it returns the object that was kicked
> out of
> > the Queue, or null]
>
> --
> Nadav Har'El                        |       Tuesday, Dec 11 2007, 3 Tevet
> 5768
> IBM Haifa Research Lab
>  |-----------------------------------------
>                                    |A professor is one who talks in
> someone
> http://nadav.harel.org.il           |else's sleep.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to