Hi Michael: I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource would work.
Please take a look. Thanks -John On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.w...@gmail.com> wrote: > Hi Michael: > It is open, http://code.google.com/p/lucene-book/source/checkout > > I think I sent the https url instead, sorry. > > The multi PQ sorting is fairly self-contained, I have 2 versions, 1 for > string and 1 for int, each are Collector impls. > > I shouldn't say the Multi Q is faster on int sort, it is within the > error boundary. The diff is very very small, I would stay they are more > equal. > > If you think it is a good thing to go this way, (if not for the perf, > just for the simpler api) I'd be happy to work on a patch. > > Thanks > > -John > > On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> John, looks like this requires login -- any plans to open that up, or, >> post the code on an issue? >> >> How self-contained is your Multi PQ sorting? EG is it a standalone >> Collector impl that I can test? >> >> Mike >> >> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.w...@gmail.com> wrote: >> > BTW, we are have a little sandbox for these experiments. And all my >> testcode >> > are at. They are not very polished. >> > >> > https://lucene-book.googlecode.com/svn/trunk >> > >> > -John >> > >> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.w...@gmail.com> wrote: >> >> >> >> Numbers Mike requested for Int types: >> >> >> >> only the time/cputime are posted, others are all the same since the >> >> algorithm is the same. >> >> >> >> Lucene 2.9: >> >> numhits: 10 >> >> time: 14619495 >> >> cpu: 146126 >> >> >> >> numhits: 20 >> >> time: 14550568 >> >> cpu: 163242 >> >> >> >> numhits: 100 >> >> time: 16467647 >> >> cpu: 178379 >> >> >> >> >> >> my test: >> >> numHits: 10 >> >> time: 14101094 >> >> cpu: 144715 >> >> >> >> numHits: 20 >> >> time: 14804821 >> >> cpu: 151305 >> >> >> >> numHits: 100 >> >> time: 15372157 >> >> cpu time: 158842 >> >> >> >> Conclusions: >> >> The are very similar, the differences are all within error bounds, >> >> especially with lower PQ sizes, which second sort alg again slightly >> faster. >> >> >> >> Hope this helps. >> >> >> >> -John >> >> >> >> >> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley < >> yo...@lucidimagination.com> >> >> wrote: >> >>> >> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless >> >>> <luc...@mikemccandless.com> wrote: >> >>> > Though it'd be odd if the switch to searching by segment >> >>> > really was most of the gains here. >> >>> >> >>> I had assumed that much of the improvement was due to ditching >> >>> MultiTermEnum/MultiTermDocs. >> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only helps >> >>> with queries that use a TermEnum (range, prefix, etc). >> >>> >> >>> -Yonik >> >>> http://www.lucidimagination.com >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >>> >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >