Re: DisjunctionScorer performance

John Wang Tue, 06 Jan 2009 23:56:21 -0800

Paul:

       Our very simple/naive testing methodology for OrDocIdSetIterator:


5 sub iterators, each subiterators just iterate from 0 to 1,000,000.

The test iterates the OrDocIdSetIterator until next() is false.

      Do you want me to run the same test against DisjunctDisi?

-John

On Tue, Jan 6, 2009 at 11:48 PM, Paul Elschot <[email protected]>wrote:

>  On Wednesday 07 January 2009 07:36:06 John Wang wrote:
>
> > Hi guys:
>
> >
>
> > We have been building a suite of boolean operators DocIdSets
>
> > (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator,
>
> > NotDocIdSet/Iterator). We compared our implementation on the
>
> > OrDocIdSetIterator (based on DisjunctionMaxScorer code) with some
>
> > code tuning, and we see the performance doubled in our testing.
>
> That's good news.
>
> What data structure did you use for sorting by doc id?
>
> Currently a priority queue is used for that, and normally that is
>
> the bottleneck for performance.
>
> > (we
>
> > haven't done comparisons with ConjuctionScorer vs.
>
> > AndDocIdSetIterator, will post numbers when we do)
>
> >
>
> > We'd be happy to contribute this back to the community. But what
>
> > is the best way of going about it?
>
> >
>
> > option 1: merge our change into DisjunctionMax/SumScorers.
>
> > option 2: contribute boolean operator sets, and have
>
> > DisjunctionScorers derive from OrDocIdSetIterator, ConjunctionScorer
>
> > derive from AndDocIdSetIterator etc.
>
> >
>
> > Option 2 seems to be cleaner. Thoughts?
>
> Some theoretical performance improvement is possible when the
>
> minimum number of required scorers/iterators is higher than 1,
>
> by using of skipTo() (as much as possible) instead of next() in
>
> such cases. For the moment that's theoretical because there
>
> is no working implementation of this yet, but have a look at
>
> LUCENE-1345 .
>
> I'm currently working on a DisjunctionDISI, probably the same function as
> the OrDocIdSetIterator you mentioned above. In case you have
>
> something faster than that, could you post it at LUCENE-1345 or at a
>
> new issue?
>
> An AndDocIdSetIterator could also be useful for the PhraseScorers and
>
> for the SpanNear queries, but that is of later concern.
>
> So I'd prefer option 2.
>
> Regards,
>
> Paul Elschot
>
>

Re: DisjunctionScorer performance

Reply via email to