RE: lucene 2.9 sorting algorithm

Uwe Schindler Mon, 19 Oct 2009 23:27:52 -0700

The old search API is already removed in trunk.


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

  _____  

From: John Wang [mailto:john.w...@gmail.com] 
Sent: Tuesday, October 20, 2009 3:28 AM
To: java-dev@lucene.apache.org
Subject: Re: lucene 2.9 sorting algorithm

 

Hi Michael:

 

     Was wondering if you got a chance to take a look at this.

 

     Since deprecated APIs are being removed in 3.0, I was wondering if/when
we would decide on keeping the ScoreDocComparator API and thus would be kept
for Lucene 3.0.

 

Thanks

 

-John

On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:

Oh, no problem...

Mike


On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.w...@gmail.com> wrote:
> Mike, just a clarification on my first perf report email.
> The first section, numHits is incorrectly labeled, it should be 20 instead
> of 50. Sorry about the possible confusion.
> Thanks
> -John
>
> On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>>
>> Thanks John; I'll have a look.
>>
>> Mike
>>
>> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.w...@gmail.com> wrote:
>> > Hi Michael:
>> >     I added classes: ScoreDocComparatorQueue and
OneSortNoScoreCollector
>> > as
>> > a more general case. I think keeping the old api for ScoreDocComparator
>> > and
>> > SortComparatorSource would work.
>> >   Please take a look.
>> > Thanks
>> > -John
>> >
>> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.w...@gmail.com> wrote:
>> >>
>> >> Hi Michael:
>> >>      It is open, http://code.google.com/p/lucene-book/source/checkout
>> >>      I think I sent the https url instead, sorry.
>> >>     The multi PQ sorting is fairly self-contained, I have 2 versions,
1
>> >> for string and 1 for int, each are Collector impls.
>> >>      I shouldn't say the Multi Q is faster on int sort, it is within
>> >> the
>> >> error boundary. The diff is very very small, I would stay they are
more
>> >> equal.
>> >>      If you think it is a good thing to go this way, (if not for the
>> >> perf,
>> >> just for the simpler api) I'd be happy to work on a patch.
>> >> Thanks
>> >> -John
>> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> >> <luc...@mikemccandless.com> wrote:
>> >>>
>> >>> John, looks like this requires login -- any plans to open that up,
or,
>> >>> post the code on an issue?
>> >>>
>> >>> How self-contained is your Multi PQ sorting?  EG is it a standalone
>> >>> Collector impl that I can test?
>> >>>
>> >>> Mike
>> >>>
>> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.w...@gmail.com>
>> >>> wrote:
>> >>> > BTW, we are have a little sandbox for these experiments. And all my
>> >>> > testcode
>> >>> > are at. They are not very polished.
>> >>> >
>> >>> > https://lucene-book.googlecode.com/svn/trunk
>> >>> >
>> >>> > -John
>> >>> >
>> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.w...@gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Numbers Mike requested for Int types:
>> >>> >>
>> >>> >> only the time/cputime are posted, others are all the same since
the
>> >>> >> algorithm is the same.
>> >>> >>
>> >>> >> Lucene 2.9:
>> >>> >> numhits: 10
>> >>> >> time: 14619495
>> >>> >> cpu: 146126
>> >>> >>
>> >>> >> numhits: 20
>> >>> >> time: 14550568
>> >>> >> cpu: 163242
>> >>> >>
>> >>> >> numhits: 100
>> >>> >> time: 16467647
>> >>> >> cpu: 178379
>> >>> >>
>> >>> >>
>> >>> >> my test:
>> >>> >> numHits: 10
>> >>> >> time: 14101094
>> >>> >> cpu: 144715
>> >>> >>
>> >>> >> numHits: 20
>> >>> >> time: 14804821
>> >>> >> cpu: 151305
>> >>> >>
>> >>> >> numHits: 100
>> >>> >> time: 15372157
>> >>> >> cpu time: 158842
>> >>> >>
>> >>> >> Conclusions:
>> >>> >> The are very similar, the differences are all within error bounds,
>> >>> >> especially with lower PQ sizes, which second sort alg again
>> >>> >> slightly
>> >>> >> faster.
>> >>> >>
>> >>> >> Hope this helps.
>> >>> >>
>> >>> >> -John
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
>> >>> >> <yo...@lucidimagination.com>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
>> >>> >>> <luc...@mikemccandless.com> wrote:
>> >>> >>> > Though it'd be odd if the switch to searching by segment
>> >>> >>> > really was most of the gains here.
>> >>> >>>
>> >>> >>> I had assumed that much of the improvement was due to ditching
>> >>> >>> MultiTermEnum/MultiTermDocs.
>> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only
>> >>> >>> helps
>> >>> >>> with queries that use a TermEnum (range, prefix, etc).
>> >>> >>>
>> >>> >>> -Yonik
>> >>> >>> http://www.lucidimagination.com
>> >>> >>>
>> >>> >>>
>> >>> >>>
---------------------------------------------------------------------
>> >>> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >>> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >>> >>>
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: lucene 2.9 sorting algorithm

Reply via email to