Hi, I will open an issue and create the patch. One thing I'm not sure of is the wouldBeInserted method you mentioned - in what context should it be used? And ... lessThan shouldn't be public, it can stay protected.
On 12/11/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > > I think we can't make lessThan public since that would cause > subclasses to fail to compile (ie this breaks backwards compatibility)? > > Adding "wouldBeInserted()" seems OK? > > Mike > > Peter Keegan wrote: > > > See my similar request from last March: > > http://www.nabble.com/FieldSortedHitQueue-enhancement- > > to9733550.html#a9733550 > > > > Peter > > > > On Dec 11, 2007 11:54 AM, Nadav Har'El <[EMAIL PROTECTED]> > > wrote: > > > >> On Mon, Dec 10, 2007, Shai Erera wrote about "Performance > >> Improvement for > >> Search using PriorityQueue": > >>> Hi > >>> > >>> Lucene's PQ implements two methods: put (assumes the PQ has room > >>> for the > >>> object) and insert (checks whether the object can be inserted > >>> etc.). The > >>> implementation of insert() requires the application that uses it to > >> allocate > >>> a new object every time it calls insert. Specifically, it cannot > >>> reuse > >> the > >>> objects that were removed from the PQ. > >> > >> I've read this entire thread, and would like to add my comments about > >> three > >> independent issues, which I think that can and perhaps should be > >> considered > >> separately: > >> > >> 1. When Shai wanted to add the insertWithOverflow() method to > >> PriorityQueue > >> he couldn't just subclass PriorityQueue in his application, but > >> rather > >> was forced to modify PriorityQueue itself. Why? just because one > >> field > >> of that classi - "heap" - was defined "private" instead of > >> "protected". > >> > >> Is there a special reason for that? If not, can we make the trivial > >> change > >> to make PriorityQueue's fields protected, to allow Shai and > >> others (see > >> the > >> next point) to add functionality on top of PriorityQueue? > >> > >> 2. PriorityQueue, in addition to being used in about a dozen > >> places inside > >> Lucene, is also a public class that advanced users often find > >> useful > >> when > >> implementing features like new collectors, new queries, and so on. > >> Unfortunately, my experience exactly matches Shai's: In the two > >> occasions > >> where I used a PriorityQueue, I found that I needed such a > >> insertWithOverflow() method. If this feature is so useful (I can't > >> believe > >> that Shai and me are the only ones who found it useful), I think it > >> would > >> be nice to add it to Lucene's PriorityQueue, even if it isn't > >> (yet) used > >> inside Lucene. > >> > >> Just to make it sound more interesting, let me give you an > >> example where > >> I needed (and implemented) an "insertWithOverflow()": I was > >> implementing > >> a > >> faceted search capability over Lucene. It calculated a count for > >> each > >> facet value, and then I used a PriorityQueue to find the 10 best > >> values. > >> The problem is that I also needed an "other" aggregator, which was > >> supposed > >> to aggregate (in various ways) all the facets except the 10 best > >> ones. > >> For > >> that, I needed to know which facets dropped off the priorityqueue. > >> > >> 3. Finally, Shai asked for this new > >> PriorityQueue.insertWithOverflow() > >> to be used in TopDocCollector. I have to admit I don't know how > >> much > >> of a benefit this will be in the "typical" case. But I do know that > >> there's no such thing as a "typical" case... > >> I can easily think of a quite typical "worst case" though: > >> Consider a > >> collection indexed in order of document age (a pretty typical > >> scenario > >> for a long-running index), and then you do a sorting query > >> (TopFieldDocCollector), asking it to bring the 10 newest documents. > >> In that case, each and every document will have a new DocScore > >> created - > >> which is the worst-case that Shai feared. > >> It would be nice if Shai or someone else could provide a > >> measurement in > >> that case. > >> > >> P.S. When looking now at PriorityQueue's code, I found two tiny > >> performance improvements that could be easily made to it - I > >> wonder if > >> there's any reason not to do them: > >> > >> 1. Insert can use heap[1] directly instead of calling top(). > >> After all, > >> this is done in an if() that already ensures that size>0. > >> > >> 2. Regardless, top() could return heap[1] always, without any if > >> (). After > >> all, the heap array is initialized to all nulls, so when size==0, > >> heap[1] > >> is null anyway. > >> > >> > >> > >>> PriorityQueue change (added insertWithOverflow method) > >>> > >> --------------------------------------------------------------------- > >> -------------- > >>> /** > >>> * insertWithOverflow() is similar to insert(), except its > >>> return > >> value: > >>> it > >>> * returns the object (if any) that was dropped off the heap > >>> because > >> it > >>> was > >>> * full. This can be the given parameter (in case it is > >>> smaller than > >> the > >>> * full heap's minimum, and couldn't be added) or another object > >> that > >>> was > >>> * previously the smallest value in the heap and now has been > >> replaced > >>> by a > >>> * larger one. > >>> */ > >>> public Object insertWithOverflow(Object element) { > >>> if (size < maxSize) { > >>> put(element); > >>> return null; > >>> } else if (size > 0 && !lessThan(element, top())) { > >>> Object ret = heap[1]; > >>> heap[1] = element; > >>> adjustTop(); > >>> return ret; > >>> } else { > >>> return element; > >>> } > >>> } > >>> [Very similar to insert(), only it returns the object that was > >>> kicked > >> out of > >>> the Queue, or null] > >> > >> -- > >> Nadav Har'El | Tuesday, Dec 11 2007, > >> 3 Tevet > >> 5768 > >> IBM Haifa Research Lab > >> |----------------------------------------- > >> |A professor is one who talks in > >> someone > >> http://nadav.harel.org.il |else's sleep. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera