Hi,
I will open an issue and create the patch. One thing I'm not sure of is the
wouldBeInserted method you mentioned - in what context should it be used?
And ... lessThan shouldn't be public, it can stay protected.


On 12/11/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
>
>
> I think we can't make lessThan public since that would cause
> subclasses to fail to compile (ie this breaks backwards compatibility)?
>
> Adding "wouldBeInserted()" seems OK?
>
> Mike
>
> Peter Keegan wrote:
>
> > See my similar request from last March:
> > http://www.nabble.com/FieldSortedHitQueue-enhancement-
> > to9733550.html#a9733550
> >
> > Peter
> >
> > On Dec 11, 2007 11:54 AM, Nadav Har'El <[EMAIL PROTECTED]>
> > wrote:
> >
> >> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance
> >> Improvement for
> >> Search using PriorityQueue":
> >>> Hi
> >>>
> >>> Lucene's PQ implements two methods: put (assumes the PQ has room
> >>> for the
> >>> object) and insert (checks whether the object can be inserted
> >>> etc.). The
> >>> implementation of insert() requires the application that uses it to
> >> allocate
> >>> a new object every time it calls insert. Specifically, it cannot
> >>> reuse
> >> the
> >>> objects that were removed from the PQ.
> >>
> >> I've read this entire thread, and would like to add my comments about
> >> three
> >> independent issues, which I think that can and perhaps should be
> >> considered
> >> separately:
> >>
> >> 1. When Shai wanted to add the insertWithOverflow() method to
> >> PriorityQueue
> >>   he couldn't just subclass PriorityQueue in his application, but
> >> rather
> >>   was forced to modify PriorityQueue itself. Why? just because one
> >> field
> >>   of that classi - "heap" - was defined "private" instead of
> >> "protected".
> >>
> >>   Is there a special reason for that? If not, can we make the trivial
> >> change
> >>   to make PriorityQueue's fields protected, to allow Shai and
> >> others (see
> >> the
> >>   next point) to add functionality on top of PriorityQueue?
> >>
> >> 2. PriorityQueue, in addition to being used in about a dozen
> >> places inside
> >>   Lucene, is also a public class that advanced users often find
> >> useful
> >> when
> >>   implementing features like new collectors, new queries, and so on.
> >>   Unfortunately, my experience exactly matches Shai's: In the two
> >> occasions
> >>   where I used a PriorityQueue, I found that I needed such a
> >>   insertWithOverflow() method. If this feature is so useful (I can't
> >> believe
> >>   that Shai and me are the only ones who found it useful), I think it
> >> would
> >>   be nice to add it to Lucene's PriorityQueue, even if it isn't
> >> (yet) used
> >>   inside Lucene.
> >>
> >>   Just to make it sound more interesting, let me give you an
> >> example where
> >>   I needed (and implemented) an "insertWithOverflow()": I was
> >> implementing
> >> a
> >>   faceted search capability over Lucene. It calculated a count for
> >> each
> >>   facet value, and then I used a PriorityQueue to find the 10 best
> >> values.
> >>   The problem is that I also needed an "other" aggregator, which was
> >> supposed
> >>   to aggregate (in various ways) all the facets except the 10 best
> >> ones.
> >> For
> >>   that, I needed to know which facets dropped off the priorityqueue.
> >>
> >> 3. Finally, Shai asked for this new
> >> PriorityQueue.insertWithOverflow()
> >>   to be used in TopDocCollector. I have to admit I don't know how
> >> much
> >>   of a benefit this will be in the "typical" case. But I do know that
> >>   there's no such thing as a "typical" case...
> >>   I can easily think of a quite typical "worst case" though:
> >> Consider a
> >>   collection indexed in order of document age (a pretty typical
> >> scenario
> >>   for a long-running index), and then you do a sorting query
> >>   (TopFieldDocCollector), asking it to bring the 10 newest documents.
> >>   In that case, each and every document will have a new DocScore
> >> created -
> >>   which is the worst-case that Shai feared.
> >>   It would be nice if Shai or someone else could provide a
> >> measurement in
> >>   that case.
> >>
> >> P.S. When looking now at PriorityQueue's code, I found two tiny
> >> performance improvements that could be easily made to it - I
> >> wonder if
> >> there's any reason not to do them:
> >>
> >>  1. Insert can use heap[1] directly instead of calling top().
> >> After all,
> >>    this is done in an if() that already ensures that size>0.
> >>
> >>  2. Regardless, top() could return heap[1] always, without any if
> >> (). After
> >>    all, the heap array is initialized to all nulls, so when size==0,
> >> heap[1]
> >>    is null anyway.
> >>
> >>
> >>
> >>> PriorityQueue change (added insertWithOverflow method)
> >>>
> >> ---------------------------------------------------------------------
> >> --------------
> >>>     /**
> >>>      * insertWithOverflow() is similar to insert(), except its
> >>> return
> >> value:
> >>> it
> >>>      * returns the object (if any) that was dropped off the heap
> >>> because
> >> it
> >>> was
> >>>      * full. This can be the given parameter (in case it is
> >>> smaller than
> >> the
> >>>      * full heap's minimum, and couldn't be added) or another object
> >> that
> >>> was
> >>>      * previously the smallest value in the heap and now has been
> >> replaced
> >>> by a
> >>>      * larger one.
> >>>      */
> >>>     public Object insertWithOverflow(Object element) {
> >>>         if (size < maxSize) {
> >>>             put(element);
> >>>             return null;
> >>>         } else if (size > 0 && !lessThan(element, top())) {
> >>>             Object ret = heap[1];
> >>>             heap[1] = element;
> >>>             adjustTop();
> >>>             return ret;
> >>>         } else {
> >>>             return element;
> >>>         }
> >>>     }
> >>> [Very similar to insert(), only it returns the object that was
> >>> kicked
> >> out of
> >>> the Queue, or null]
> >>
> >> --
> >> Nadav Har'El                        |       Tuesday, Dec 11 2007,
> >> 3 Tevet
> >> 5768
> >> IBM Haifa Research Lab
> >>  |-----------------------------------------
> >>                                    |A professor is one who talks in
> >> someone
> >> http://nadav.harel.org.il           |else's sleep.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Reply via email to