[jira] Commented: (LUCENE-1187) Things to be done now that Filter is independent from BitSet

Eks Dev (JIRA) Sun, 24 Feb 2008 12:18:02 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571939#action_12571939
 ]


Eks Dev commented on LUCENE-1187:
---------------------------------

Paul, I think there is one CHEKME in DisjunctionSumScorer I have stumbled upon 
recently when I realized 
(token1+ token2+) query works way faster than (token1 
token2).setMinimumSholdMatch(2). It is not directly related to the LUCENE-584, 
but just as a reminder. 

also I think there is a hard_to_detect_small_maybe_performance_bug in 
ConjuctionScorer, :
{code:java}
    // If first-time skip distance is any predictor of
    // scorer sparseness, then we should always try to skip first on
    // those scorers.
    // Keep last scorer in it's last place (it will be the first
    // to be skipped on), but reverse all of the others so that
    // they will be skipped on in order of original high skip.
    int end=(scorers.length-1)-1;
    for (int i=0; i<(end>>1); i++) {
      Scorer tmp = scorers[i];
      scorers[i] = scorers[end-i];
      scorers[end-i] = tmp;
    }
{code}


It has not been detected so far as it has only performance implications (I 
think?), and it sometimes works and sometimes not, depending on number of 
scorers:

to see what I am talking about, try this "simulator":

{code:java}
  public static void main(String[] args) {
    int[] scorers = new int[7]; //3 and 7 do not work
   
    for (int i=0; i<scorers.length; i++) {
      scorers[i]=i;
    }
   
    System.out.println(Arrays.toString(scorers));
   
   
    int end=(scorers.length-1)-1;
    for (int i=0; i<(end>>1); i++) {
      int tmp = scorers[i];
      scorers[i] = scorers[end-i];
      scorers[end-i] = tmp;
    }

    System.out.println(Arrays.toString(scorers));

  }

{code}
for 7 you get:
[0, 1, 2, 3, 4, 5, 6]
[5, 4, 2, 3, 1, 0, 6]

instead of [5, 4, 3, 2, 1, 0, 6]

and for 3
[0, 1, 2]
[0, 1, 2] (should be [1, 0, 2])



> Things to be done now that Filter is independent from BitSet
> ------------------------------------------------------------
>
>                 Key: LUCENE-1187
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1187
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Priority: Minor
>
> (Aside: where is the documentation on how to mark up text in jira comments?)
> The following things are left over after LUCENE-584 :
> For Lucene 3.0  Filter.bits() will have to be removed.
> There is a CHECKME in IndexSearcher about using ConjunctionScorer to have the 
> boolean behaviour of a Filter.
> I have not looked into Filter caching yet, but I suppose there will be some 
> room for improvement there.
> Iirc the current core has moved to use OpenBitSetFilter and that is probably 
> what is being cached.
> In some cases it might be better to cache a SortedVIntList instead.
> Boolean logic on DocIdSetIterator is already available for Scorers (that 
> inherit from DocIdSetIterator) in the search package. This is currently 
> implemented by ConjunctionScorer, DisjunctionSumScorer,
> ReqOptSumScorer and ReqExclScorer.
> Boolean logic on BitSets is available in contrib/misc and contrib/queries
> DisjunctionSumScorer calls score() on its subscorers before the score value 
> actually needed.
> This could be a reason to introduce a DisjunctionDocIdSetIterator, perhaps as 
> a superclass of DisjunctionSumScorer.
> To fully implement non scoring queries a TermDocIdSetIterator will be needed, 
> perhaps as a superclass of TermScorer.
> The javadocs in org.apache.lucene.search using matching vs non-zero score:
> I'll investigate this soon, and provide a patch when necessary.
> An early version of the patches of LUCENE-584 contained a class Matcher,
> that differs from the current DocIdSet in that Matcher has an explain() 
> method.
> It remains to be seen whether such a Matcher could be useful between
> DocIdSet and Scorer.
> The semantics of scorer.skipTo(scorer.doc()) was discussed briefly.
> This was also discussed at another issue recently, so perhaps it is wortwhile 
> to open a separate issue for this.
> Skipping on a SortedVIntList is done using linear search, this could be 
> improved by adding multilevel skiplist info much like in the Lucene index for 
> documents containing a term.
> One comment by me of 3 Dec 2008:
> A few complete (test) classes are deprecated, it might be good to add the 
> target release for removal there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1187) Things to be done now that Filter is independent from BitSet

Reply via email to