Re: Usefulness of Similarity.queryNorm()

Michele Bini Wed, 13 Feb 2008 01:49:04 -0800

Chris Hostetter wrote:

The tf(), idf(), lengthNorm() and queryNorm() are directly from thecosine measure, although lengthNorm()'s default implemenation uses anapproximation.

As I actually found normalized query scores quite useful I decided toexit my usual lurk-mode :)

I integrated lucene with carrot2 (more specifically, carrot's lingoclustering algorithm, which at its core is based on cosine products) andin order to incrementally restrict lucene query to carrot clusters it isreally essential that the lucene query scores are, more or less, what acosine product would give.

From my memory, I think I could post process the scores into a cosineproduct using sumOfSquaredWeights() just as Query.weight() does now, butmy point is slightly different.

From a library user point of view, I think it's important that luceneoffers clear, simple hooks to tweak (and even completely change) thecomputed score.

In some cases you need to compute a completely different score and youuse a ValueSourceQuery. But sometimes you are "lucky" (read: I chooselingo for that reason, among the others) as lucene and the clusteringalgorithm were using [nearly] the same score and you don't have tocompute it again, thus increasing performance.



Just my two cents,
Michele

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Usefulness of Similarity.queryNorm()

Reply via email to