..but this means, that the scores are not comparable across queries,
because a hit with the score '0.7' from one query mustn't be as 'good' as
a '0.7' from another query...and this is only the case, whether the original,
unnormalized top score value was less than 1.0.

Looks this really like a feasible way to normalize similarity values, especially
the distinction according to the top-score? Can someone really say, that a 
normalization
is meaningfull or not - related to the top score value?


I have made a further look, and it seems that the score-values inside the 
explanations
are not normalized?! We need normalized similarity values(e.g. in a range 
[0..1]), that
are comparable across queries. The situation now says that we have two score 
values:

1. an normalized one from the Hits-Class, without cross-query comparability
2. one unnormalized from IndexSearcher.explain(..), with cross-query 
comparability

I am a little  bit confused now..does this mean, that the default similarity 
implementation is
not adequate for such kind of problems?

best regards,

Chris



--
______________________________________________________________________

Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Erwin-Schrödinger-Straße 57, D-67663 Kaiserslautern, Germany

Phone: +49.631.205-3441
mailto:[EMAIL PROTECTED]  http://www.dfki.uni-kl.de/~reuschling/
______________________________________________________________________




Chris Lamprecht wrote:
It takes the highest scoring document, if greater than 1.0, and
divides every hit's score by this number, leaving them all <= 1.0. Actually, I just looked at the code, and it actually does this by
taking 1/maxScore and then multiplying this by each score (equivalent
results in the end, maybe more efficient(?)).  See the method
getMoreDocs() in Hits.java (org.apache.lucene.search.Hits):

[...]
    float scoreNorm = 1.0f;

    if (length > 0 && topDocs.getMaxScore() > 1.0f) {
      scoreNorm = 1.0f / topDocs.getMaxScore();
    }

    int end = scoreDocs.length < length ? scoreDocs.length : length;
    for (int i = hitDocs.size(); i < end; i++) {
      hitDocs.addElement(new HitDoc(scoreDocs[i].score * scoreNorm,
                                    scoreDocs[i].doc));
    }



On 1/27/06, xing jiang <[EMAIL PROTECTED]> wrote:
Hi,

I want to know how the lucene normalizes the score. I see hits class has
this function to get each document's score. But i dont know how lucene
calculates the normalized score and in the "Lucene in action", it only said
normalized score of the nth top scoring docuemnts.
--
Regards

Jiang Xing



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to