Chris Hostetter wrote:
The tf(), idf(), lengthNorm() and queryNorm() are directly from the cosine measure, although lengthNorm()'s default implemenation uses an approximation.

As I actually found normalized query scores quite useful I decided to exit my usual lurk-mode :)

I integrated lucene with carrot2 (more specifically, carrot's lingo clustering algorithm, which at its core is based on cosine products) and in order to incrementally restrict lucene query to carrot clusters it is really essential that the lucene query scores are, more or less, what a cosine product would give.

From my memory, I think I could post process the scores into a cosine product using sumOfSquaredWeights() just as Query.weight() does now, but my point is slightly different.

From a library user point of view, I think it's important that lucene offers clear, simple hooks to tweak (and even completely change) the computed score.

In some cases you need to compute a completely different score and you use a ValueSourceQuery. But sometimes you are "lucky" (read: I choose lingo for that reason, among the others) as lucene and the clustering algorithm were using [nearly] the same score and you don't have to compute it again, thus increasing performance.


Just my two cents,
Michele

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to