..but this means, that the scores are not comparable across queries,
because a hit with the score '0.7' from one query mustn't be as 'good' as
a '0.7' from another query...and this is only the case, whether the original,
unnormalized top score value was less than 1.0.
Looks this really like a feasible way to normalize similarity values, especially
the distinction according to the top-score? Can someone really say, that a
normalization
is meaningfull or not - related to the top score value?
I have made a further look, and it seems that the score-values inside the
explanations
are not normalized?! We need normalized similarity values(e.g. in a range
[0..1]), that
are comparable across queries. The situation now says that we have two score
values:
1. an normalized one from the Hits-Class, without cross-query comparability
2. one unnormalized from IndexSearcher.explain(..), with cross-query
comparability
I am a little bit confused now..does this mean, that the default similarity
implementation is
not adequate for such kind of problems?
best regards,
Chris
--
______________________________________________________________________
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer
Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Erwin-Schrödinger-Straße 57, D-67663 Kaiserslautern, Germany
Phone: +49.631.205-3441
mailto:[EMAIL PROTECTED] http://www.dfki.uni-kl.de/~reuschling/
______________________________________________________________________
Chris Lamprecht wrote:
It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all <= 1.0.
Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more efficient(?)). See the method
getMoreDocs() in Hits.java (org.apache.lucene.search.Hits):
[...]
float scoreNorm = 1.0f;
if (length > 0 && topDocs.getMaxScore() > 1.0f) {
scoreNorm = 1.0f / topDocs.getMaxScore();
}
int end = scoreDocs.length < length ? scoreDocs.length : length;
for (int i = hitDocs.size(); i < end; i++) {
hitDocs.addElement(new HitDoc(scoreDocs[i].score * scoreNorm,
scoreDocs[i].doc));
}
On 1/27/06, xing jiang <[EMAIL PROTECTED]> wrote:
Hi,
I want to know how the lucene normalizes the score. I see hits class has
this function to get each document's score. But i dont know how lucene
calculates the normalized score and in the "Lucene in action", it only said
normalized score of the nth top scoring docuemnts.
--
Regards
Jiang Xing
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]