On Tuesday 15 November 2005 23:45, Yonik Seeley wrote: > Totally untested, but here is a hack at what the scorer might look > like when the number of terms is large. > > -Yonik > > > package org.apache.lucene.search; > > import org.apache.lucene.index.TermEnum; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.TermDocs; > > import java.io.IOException; > > /** > * @author yonik > * @version $Id$ > */ > public class MultiTermScorer extends Scorer{ > protected final float[] scores; > protected int pos; > protected float docScore; > > public MultiTermScorer(Similarity similarity, IndexReader reader, > Weight w, TermEnum terms, byte[] norms, boolean include_idf, boolean > include_tf) throws IOException { > super(similarity); > float weightVal = w.getValue(); > int maxDoc = reader.maxDoc(); > this.scores = new float[maxDoc]; > float[] normDecoder = Similarity.getNormDecoder(); > > TermDocs tdocs = reader.termDocs();
This part is only needed at the top level of the query, so one could implement in this optimization hook of BooleanScorer: /** Expert: Collects matching documents in a range. * <br>Note that [EMAIL PROTECTED] #next()} must be called once before this method is * called for the first time. * @param hc The collector to which all matching documents are passed through * [EMAIL PROTECTED] HitCollector#collect(int, float)}. * @param max Do not score documents past this. * @return true if more matching documents may remain. */ protected boolean score(HitCollector hc, int max) throws IOException { ... } > while (terms.next()) { > tdocs.seek(terms); terms.term() iirc. > float termScore = weightVal; > if (include_idf) { > termScore *= similarity.idf(terms.docFreq(),maxDoc); > } > while (tdocs.next()) { > int doc = tdocs.doc(); > float subscore = termScore; > if (include_tf) subscore *= tdocs.freq(); getSimilarity().tf(tdocs.freq()); > if (norms!=null) subscore *= normDecoder[norms[doc&0xff]]; > scores[doc] += subscore; The scores[] array is the pain point, but when it can be used this can be generalized to DisjunctionSumScorer, so it would work for all disjunctions, not only terms. I think it is possible to implement this hook for DisjunctionSumScorer with a scores[] array, iterating over the subscorers one by one. Getting that hook called through BooleanScorer2 is no problem when the coordination factor can be left out. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]