Grant Ingersoll wrote: > > What I would like to get at is why anyone thinks scores are > comparable across queries to begin with. > They are somewhat comparable because we are using the approximate cosine between the document/query vectors for the score - plus boosts n stuff. How close the vectors are to each other. If q1 has a smaller angle diff with d1 than q2 does with d2, then you can do a comparison. Its just vector similarities. Its approximate because we fudge the normalization. Why do you think the scores within a query search are comparable? Whats the difference when you try another query? The query is the difference, and the query norm is what makes it more comparable. Its just a different query vector with another query. Its still going to just be a given "angle" from the doc vectors. Closer is considered a better match. We don't do it to improve anything, or because someone discovered something - its just part of the formula for calculating the cosine. Its the dot product formula. You can lose it and keep the same relative rankings, but then you are further from the cosine for the score - you start scaling by the magnitude of the query vector. When you do that they are not so comparable.
If you take out the queryNorm, its much less comparable. You are effectively multiplying the cosine by the magnitude of the query vector - so different queries will scale the score differently - and not in a helpful way - a term vector and query vector can have very different magnitudes, but very similar term distributions. Thats why we are using the cosine rather than euclidean distance in the first place. Pretty sure its more linear algebra than IR - or the vector stuff from calc 3 (or wherever else different schools put it). --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org