[ https://issues.apache.org/jira/browse/LUCENE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571939#action_12571939 ]
Eks Dev commented on LUCENE-1187: --------------------------------- Paul, I think there is one CHEKME in DisjunctionSumScorer I have stumbled upon recently when I realized (token1+ token2+) query works way faster than (token1 token2).setMinimumSholdMatch(2). It is not directly related to the LUCENE-584, but just as a reminder. also I think there is a hard_to_detect_small_maybe_performance_bug in ConjuctionScorer, : {code:java} // If first-time skip distance is any predictor of // scorer sparseness, then we should always try to skip first on // those scorers. // Keep last scorer in it's last place (it will be the first // to be skipped on), but reverse all of the others so that // they will be skipped on in order of original high skip. int end=(scorers.length-1)-1; for (int i=0; i<(end>>1); i++) { Scorer tmp = scorers[i]; scorers[i] = scorers[end-i]; scorers[end-i] = tmp; } {code} It has not been detected so far as it has only performance implications (I think?), and it sometimes works and sometimes not, depending on number of scorers: to see what I am talking about, try this "simulator": {code:java} public static void main(String[] args) { int[] scorers = new int[7]; //3 and 7 do not work for (int i=0; i<scorers.length; i++) { scorers[i]=i; } System.out.println(Arrays.toString(scorers)); int end=(scorers.length-1)-1; for (int i=0; i<(end>>1); i++) { int tmp = scorers[i]; scorers[i] = scorers[end-i]; scorers[end-i] = tmp; } System.out.println(Arrays.toString(scorers)); } {code} for 7 you get: [0, 1, 2, 3, 4, 5, 6] [5, 4, 2, 3, 1, 0, 6] instead of [5, 4, 3, 2, 1, 0, 6] and for 3 [0, 1, 2] [0, 1, 2] (should be [1, 0, 2]) > Things to be done now that Filter is independent from BitSet > ------------------------------------------------------------ > > Key: LUCENE-1187 > URL: https://issues.apache.org/jira/browse/LUCENE-1187 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Paul Elschot > Priority: Minor > > (Aside: where is the documentation on how to mark up text in jira comments?) > The following things are left over after LUCENE-584 : > For Lucene 3.0 Filter.bits() will have to be removed. > There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the > boolean behaviour of a Filter. > I have not looked into Filter caching yet, but I suppose there will be some > room for improvement there. > Iirc the current core has moved to use OpenBitSetFilter and that is probably > what is being cached. > In some cases it might be better to cache a SortedVIntList instead. > Boolean logic on DocIdSetIterator is already available for Scorers (that > inherit from DocIdSetIterator) in the search package. This is currently > implemented by ConjunctionScorer, DisjunctionSumScorer, > ReqOptSumScorer and ReqExclScorer. > Boolean logic on BitSets is available in contrib/misc and contrib/queries > DisjunctionSumScorer calls score() on its subscorers before the score value > actually needed. > This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as > a superclass of DisjunctionSumScorer. > To fully implement non scoring queries a TermDocIdSetIterator will be needed, > perhaps as a superclass of TermScorer. > The javadocs in org.apache.lucene.search using matching vs non-zero score: > I'll investigate this soon, and provide a patch when necessary. > An early version of the patches of LUCENE-584 contained a class Matcher, > that differs from the current DocIdSet in that Matcher has an explain() > method. > It remains to be seen whether such a Matcher could be useful between > DocIdSet and Scorer. > The semantics of scorer.skipTo(scorer.doc()) was discussed briefly. > This was also discussed at another issue recently, so perhaps it is wortwhile > to open a separate issue for this. > Skipping on a SortedVIntList is done using linear search, this could be > improved by adding multilevel skiplist info much like in the Lucene index for > documents containing a term. > One comment by me of 3 Dec 2008: > A few complete (test) classes are deprecated, it might be good to add the > target release for removal there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]