Re: lucene 2.9 sorting algorithm

John Wang Thu, 15 Oct 2009 18:52:54 -0700

Hi Michael:
     It is open, http://code.google.com/p/lucene-book/source/checkout


     I think I sent the https url instead, sorry.

    The multi PQ sorting is fairly self-contained, I have 2 versions, 1 for
string and 1 for int, each are Collector impls.

     I shouldn't say the Multi Q is faster on int sort, it is within the
error boundary. The diff is very very small, I would stay they are more
equal.

     If you think it is a good thing to go this way, (if not for the perf,
just for the simpler api) I'd be happy to work on a patch.

Thanks

-John

On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless <
[email protected]> wrote:

> John, looks like this requires login -- any plans to open that up, or,
> post the code on an issue?
>
> How self-contained is your Multi PQ sorting?  EG is it a standalone
> Collector impl that I can test?
>
> Mike
>
> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <[email protected]> wrote:
> > BTW, we are have a little sandbox for these experiments. And all my
> testcode
> > are at. They are not very polished.
> >
> > https://lucene-book.googlecode.com/svn/trunk
> >
> > -John
> >
> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <[email protected]> wrote:
> >>
> >> Numbers Mike requested for Int types:
> >>
> >> only the time/cputime are posted, others are all the same since the
> >> algorithm is the same.
> >>
> >> Lucene 2.9:
> >> numhits: 10
> >> time: 14619495
> >> cpu: 146126
> >>
> >> numhits: 20
> >> time: 14550568
> >> cpu: 163242
> >>
> >> numhits: 100
> >> time: 16467647
> >> cpu: 178379
> >>
> >>
> >> my test:
> >> numHits: 10
> >> time: 14101094
> >> cpu: 144715
> >>
> >> numHits: 20
> >> time: 14804821
> >> cpu: 151305
> >>
> >> numHits: 100
> >> time: 15372157
> >> cpu time: 158842
> >>
> >> Conclusions:
> >> The are very similar, the differences are all within error bounds,
> >> especially with lower PQ sizes, which second sort alg again slightly
> faster.
> >>
> >> Hope this helps.
> >>
> >> -John
> >>
> >>
> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley <
> [email protected]>
> >> wrote:
> >>>
> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
> >>> <[email protected]> wrote:
> >>> > Though it'd be odd if the switch to searching by segment
> >>> > really was most of the gains here.
> >>>
> >>> I had assumed that much of the improvement was due to ditching
> >>> MultiTermEnum/MultiTermDocs.
> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only helps
> >>> with queries that use a TermEnum (range, prefix, etc).
> >>>
> >>> -Yonik
> >>> http://www.lucidimagination.com
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: lucene 2.9 sorting algorithm

Reply via email to