I think the TermScorer could be used to produce some useful feedback on performance of terms used in queries with the addition of some new methods: int getNumDocMatches();
Is this just IndexReader#docFreq(Term), or is the sum of all of the TermDocs#freq() for the term?
float getAverageScore();
Would the average really that useful? This could the same for a term which has ten very strong matches and ninety very weak matches as for a term that has 100 middling matches.
These could be used in the following scenarios: * selecting which terms to offer spelling correction on (when numDocMatches==0)
Would the above be better than IndexReader#docFreq(Term) for this?
* influencing the highlighter selections (doc fragments scored based on contained term weights)
I don't see how the above would help here. The ideal way to score fragments would be to create an index (e.g., using a RAMDirectory) of fragments, then search this with the query to find the top matches. One can approximate this more efficiently by looking for fragments with a high density of query terms, perhaps taking idf's into account.
* For "more like this" natural language type queries the highlighter could highlight only "significantly" scored terms and ignore low-scoring noise words.
The best method to identify significant words is with Similarity#idf(Term,Searcher). Significant words have higher idfs, noise words have lower idfs.
I know it would be possible to derive all this information using existing APIs but it would effectively involve another pass of the same index data.
Unless I am mistaken, I think most of what you're after can be accomplished with only another access to the term dictionary data, and does not require another pass over, e.g., the TermDocs.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]