Re: lucene 2.9 sorting algorithm

Michael McCandless Fri, 16 Oct 2009 03:21:48 -0700

Thanks John; I'll have a look.

Mike


On Fri, Oct 16, 2009 at 12:57 AM, John Wang <[email protected]> wrote:
> Hi Michael:
>     I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as
> a more general case. I think keeping the old api for ScoreDocComparator and
> SortComparatorSource would work.
>   Please take a look.
> Thanks
> -John
>
> On Thu, Oct 15, 2009 at 6:52 PM, John Wang <[email protected]> wrote:
>>
>> Hi Michael:
>>      It is open, http://code.google.com/p/lucene-book/source/checkout
>>      I think I sent the https url instead, sorry.
>>     The multi PQ sorting is fairly self-contained, I have 2 versions, 1
>> for string and 1 for int, each are Collector impls.
>>      I shouldn't say the Multi Q is faster on int sort, it is within the
>> error boundary. The diff is very very small, I would stay they are more
>> equal.
>>      If you think it is a good thing to go this way, (if not for the perf,
>> just for the simpler api) I'd be happy to work on a patch.
>> Thanks
>> -John
>> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> <[email protected]> wrote:
>>>
>>> John, looks like this requires login -- any plans to open that up, or,
>>> post the code on an issue?
>>>
>>> How self-contained is your Multi PQ sorting?  EG is it a standalone
>>> Collector impl that I can test?
>>>
>>> Mike
>>>
>>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <[email protected]> wrote:
>>> > BTW, we are have a little sandbox for these experiments. And all my
>>> > testcode
>>> > are at. They are not very polished.
>>> >
>>> > https://lucene-book.googlecode.com/svn/trunk
>>> >
>>> > -John
>>> >
>>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <[email protected]> wrote:
>>> >>
>>> >> Numbers Mike requested for Int types:
>>> >>
>>> >> only the time/cputime are posted, others are all the same since the
>>> >> algorithm is the same.
>>> >>
>>> >> Lucene 2.9:
>>> >> numhits: 10
>>> >> time: 14619495
>>> >> cpu: 146126
>>> >>
>>> >> numhits: 20
>>> >> time: 14550568
>>> >> cpu: 163242
>>> >>
>>> >> numhits: 100
>>> >> time: 16467647
>>> >> cpu: 178379
>>> >>
>>> >>
>>> >> my test:
>>> >> numHits: 10
>>> >> time: 14101094
>>> >> cpu: 144715
>>> >>
>>> >> numHits: 20
>>> >> time: 14804821
>>> >> cpu: 151305
>>> >>
>>> >> numHits: 100
>>> >> time: 15372157
>>> >> cpu time: 158842
>>> >>
>>> >> Conclusions:
>>> >> The are very similar, the differences are all within error bounds,
>>> >> especially with lower PQ sizes, which second sort alg again slightly
>>> >> faster.
>>> >>
>>> >> Hope this helps.
>>> >>
>>> >> -John
>>> >>
>>> >>
>>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
>>> >> <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
>>> >>> <[email protected]> wrote:
>>> >>> > Though it'd be odd if the switch to searching by segment
>>> >>> > really was most of the gains here.
>>> >>>
>>> >>> I had assumed that much of the improvement was due to ditching
>>> >>> MultiTermEnum/MultiTermDocs.
>>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only helps
>>> >>> with queries that use a TermEnum (range, prefix, etc).
>>> >>>
>>> >>> -Yonik
>>> >>> http://www.lucidimagination.com
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: [email protected]
>>> >>> For additional commands, e-mail: [email protected]
>>> >>>
>>> >>
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: lucene 2.9 sorting algorithm

Reply via email to