On Tuesday 15 November 2005 23:45, Yonik Seeley wrote:
> Totally untested, but here is a hack at what the scorer might look
> like when the number of terms is large.
> 
> -Yonik
> 
> 
> package org.apache.lucene.search;
> 
> import org.apache.lucene.index.TermEnum;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.TermDocs;
> 
> import java.io.IOException;
> 
> /**
>  * @author yonik
>  * @version $Id$
>  */
> public class MultiTermScorer extends Scorer{
>   protected final float[] scores;
>   protected int pos;
>   protected float docScore;
> 
>   public MultiTermScorer(Similarity similarity, IndexReader reader,
> Weight w, TermEnum terms, byte[] norms, boolean include_idf, boolean
> include_tf) throws IOException {
>     super(similarity);
>     float weightVal = w.getValue();
>     int maxDoc = reader.maxDoc();
>     this.scores = new float[maxDoc];
>     float[] normDecoder = Similarity.getNormDecoder();
> 
>     TermDocs tdocs = reader.termDocs();

This part is only needed at the top level of the query, so
one could implement in this optimization hook of BooleanScorer:

  /** Expert: Collects matching documents in a range.
   * <br>Note that [EMAIL PROTECTED] #next()} must be called once before this 
method is
   * called for the first time.
   * @param hc The collector to which all matching documents are passed 
through
   * [EMAIL PROTECTED] HitCollector#collect(int, float)}.
   * @param max Do not score documents past this.
   * @return true if more matching documents may remain.
   */
  protected boolean score(HitCollector hc, int max) throws IOException {
...
  }

>     while (terms.next()) {
>       tdocs.seek(terms);

terms.term() iirc.

>       float termScore = weightVal;
>       if (include_idf) {
>         termScore *= similarity.idf(terms.docFreq(),maxDoc);
>       }
>       while (tdocs.next()) {
>         int doc = tdocs.doc();
>         float subscore = termScore;
>         if (include_tf) subscore *= tdocs.freq();

getSimilarity().tf(tdocs.freq());

>         if (norms!=null) subscore *= normDecoder[norms[doc&0xff]];
>         scores[doc] += subscore;

The scores[] array is the pain point, but when it can be used
this can be generalized to DisjunctionSumScorer, so it would
work for all disjunctions, not only terms.

I think it is possible to implement this hook for
DisjunctionSumScorer with a scores[] array, iterating over the
subscorers one by one.
Getting that hook called through BooleanScorer2 is no problem
when the coordination factor can be left out.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to