> : ! If a document does not contain a queryterm this score > can be larger > : or smaller than 0 ! > > if a document doesn't contain a term, then the scorer for > that query will never even try to score that document -- > regardless of what your Similarity class looks like. > > if you really want this kind of behavior, you'll need to roll > your own TermQuery/TermScorer classes and change next and > skipTo to allways advance ot the next doc -- regardless of > wether or not it matches (you can check for that in the score > function and act accordingly)
That sounds like a reasonable approach. However, I still require the searching process to be optimized for retrieving the first n hits. (I made my own implementation outside the Lucene search-architecture which was unbelievably slow). For example: a query containing two terms: "fast", "car", having document frequencies 300.000 and 20.000 in the index respectively. In a worst case scenario this would require 320.000 document scores to be calculated. I am not really sure how lucene optimizes its search, but I guess it does that by first processing the documents having the highest term frequencies (and thus highest combined score) with these query terms, and pruning the search if the n hits have been found and it's certain that no document can be found which will give a higher score. If I would change the next function in my own scorer to process all document ids, I am afraid I will wreck Lucene's optimization method (as I am then not serving the documents in descending term frequency order). Perhaps someone can tell if Lucene indeed requires <scorer>.next() to return the documents for a term in descending term frequency order? Regards, Dolf --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]