Paul: Our very simple/naive testing methodology for OrDocIdSetIterator:
5 sub iterators, each subiterators just iterate from 0 to 1,000,000. The test iterates the OrDocIdSetIterator until next() is false. Do you want me to run the same test against DisjunctDisi? -John On Tue, Jan 6, 2009 at 11:48 PM, Paul Elschot <paul.elsc...@xs4all.nl>wrote: > On Wednesday 07 January 2009 07:36:06 John Wang wrote: > > > Hi guys: > > > > > > We have been building a suite of boolean operators DocIdSets > > > (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator, > > > NotDocIdSet/Iterator). We compared our implementation on the > > > OrDocIdSetIterator (based on DisjunctionMaxScorer code) with some > > > code tuning, and we see the performance doubled in our testing. > > That's good news. > > What data structure did you use for sorting by doc id? > > Currently a priority queue is used for that, and normally that is > > the bottleneck for performance. > > > (we > > > haven't done comparisons with ConjuctionScorer vs. > > > AndDocIdSetIterator, will post numbers when we do) > > > > > > We'd be happy to contribute this back to the community. But what > > > is the best way of going about it? > > > > > > option 1: merge our change into DisjunctionMax/SumScorers. > > > option 2: contribute boolean operator sets, and have > > > DisjunctionScorers derive from OrDocIdSetIterator, ConjunctionScorer > > > derive from AndDocIdSetIterator etc. > > > > > > Option 2 seems to be cleaner. Thoughts? > > Some theoretical performance improvement is possible when the > > minimum number of required scorers/iterators is higher than 1, > > by using of skipTo() (as much as possible) instead of next() in > > such cases. For the moment that's theoretical because there > > is no working implementation of this yet, but have a look at > > LUCENE-1345 . > > I'm currently working on a DisjunctionDISI, probably the same function as > the OrDocIdSetIterator you mentioned above. In case you have > > something faster than that, could you post it at LUCENE-1345 or at a > > new issue? > > An AndDocIdSetIterator could also be useful for the PhraseScorers and > > for the SpanNear queries, but that is of later concern. > > So I'd prefer option 2. > > Regards, > > Paul Elschot > >