> :    ! If a document does not contain a queryterm this score 
> can be larger
> : or smaller than 0 !
> 
> if a document doesn't contain a term, then the scorer for 
> that query will never even try to score that document -- 
> regardless of what your Similarity class looks like.
> 
> if you really want this kind of behavior, you'll need to roll 
> your own TermQuery/TermScorer classes and change next and 
> skipTo to allways advance ot the next doc -- regardless of 
> wether or not it matches (you can check for that in the score 
> function and act accordingly)

That sounds like a reasonable approach. However, I still require the searching 
process to be optimized for retrieving the first n hits. (I made my own 
implementation outside the Lucene search-architecture which was unbelievably 
slow).

For example: a query containing two terms: "fast", "car", having document 
frequencies 300.000 and 20.000 in the index respectively. In a worst case 
scenario this would require 320.000 document scores to be calculated. I am not 
really sure how lucene optimizes its search, but I guess it does that by first 
processing the documents having the highest term frequencies (and thus highest 
combined score) with these query terms, and pruning the search if the n hits 
have been found and it's certain that no document can be found which will give 
a higher score.

If I would change the next function in my own scorer to process all document 
ids, I am afraid I will wreck Lucene's optimization method (as I am then not 
serving the documents in descending term frequency order).

Perhaps someone can tell if Lucene indeed requires <scorer>.next() to return 
the documents for a term in descending term frequency order?

Regards,
Dolf

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to