Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

Yonik Seeley Fri, 25 May 2007 08:23:05 -0700

On 5/25/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote:

In reading the math for scoring at the bottom of:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html


It appears that if I can make tf() and idf(), term frequency and
inverse document frequency respectively, both return 1, then coord(),
which is now the primary factor of the product, is what I'm looking
for.


Pretty close, I think.  There is still the length normalization factor
that biases short fields over long.  That's calculated at index time,
and stored in the "norm" along with the boost (they are multiplied
together).

You can change the similarity during indexing, or you can completely
knock out norms via Field.setOmitNorms(true)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

Reply via email to