Trying to put up an explanation :- 0.022172567 = (MATCH) product of: 0.07760398 = (MATCH) sum of: 0.02287053 = (MATCH) weight(payload:ces in 550), product of: 0.32539415 = queryWeight(payload:ces), product of: 2.2491398 = *idf*(docFreq=157, maxDocs=551) 0.14467494 = queryNorm 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: 1.0 = *tf(*termFreq(payload:ces)=1) 2.2491398 = *idf(*docFreq=157, maxDocs=551) 0.03125 = *fieldNorm*(field=payload, doc=550) 0.05473345 = (MATCH) weight(payload:deal in 550), product of: 0.23803486 = queryWeight(payload:deal), product of: 1.6453081 = *idf(*docFreq=288, maxDocs=551) 0.14467494 = *queryNorm* 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of: 4.472136 = tf(termFreq(payload:deal)=20) 1.6453081 = idf(docFreq=288, maxDocs=551) 0.03125 = fieldNorm(field=payload, doc=550) 0.2857143 = coord(2/7)
1. tf = term frequency in document = measure of how often a term appears in the document 1. Implementation: sqrt(freq) Implication: the more frequent a term occurs in a document, the greater its score Rationale: documents which contains more of a term are generally more relevant 2. idf = inverse document frequency = measure of how often the term appears across the index 1. Implementation: log(numDocs/(docFreq+1)) + 1 Implication: the greater the occurrence of a term in different documents, the lower its score Rationale: common terms are less important than uncommon ones 3. coord = number of terms in the query that were found in the document 1. Implementation: overlap / maxOverlap Implication: of the terms in the query, a document that contains more terms will have a higher score Rationale: self-explanatory 4. fieldNorm 1. lengthNorm = measure of the importance of a term according to the total number of terms in the field 1. Implementation: 1/sqrt(numTerms) 2. Implication: a term matched in fields with less terms have a higher score 3. Rationale: a term in a field with less terms is more important than one with more 2. boost (index) = boost of the field at index-time 1. Index time boost specified. The fieldNorm value in the score would include the same. 3. boost (query) = boost of the field at query-time 5. queryNorm = normalization factor so that queries can be compared 1. queryNorm is not related to the relevance of the document, but rather tries to make scores between different queries comparable. It is implemented as 1/sqrt(sumOfSquaredWeights) When you are trying to search for Query: *It is definitely a CES deal that will be over in Sep or Oct of this year.* 1. Lucene would try to match each word in our query in each field that you have specified to be searched on e.g. payload in your case. 2. In your example, it found match only on ces and deal, hence only the two items are displayed. 3. The number of matches in the particular field also contributes to the 0.2857143 = coord(*2*/7) - 2 words out of 7 4. *idf*(docFreq=157, maxDocs=551) - specified the rarity. The docfreq specifies the number of documents which have the word in the field with the maxdocs represents the total number of documents. 5. *tf(*termFreq(payload:ces)=1) - Specifies the number of times it occurs e.g. 1 in this case. 6. The Score for each field match is the product of the 0.02287053 = (MATCH) weight(payload:ces in 550), product of: Field boost and idf 0.32539415 = queryWeight(payload:ces), product of: * 1 = boost (**The boost if your case seems to be 1 and hence is not included in the score.**)* 2.2491398 = idf(docFreq=157, maxDocs=551) 0.14467494 = queryNorm term frequency, idf and field norm 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: 1.0 = *tf(*termFreq(payload:ces)=1) 2.2491398 = *idf(*docFreq=157, maxDocs=551) 0.03125 = *fieldNorm*(field=payload, doc=550) Regards, Jayendra On Sat, Aug 7, 2010 at 11:02 AM, Soby Thomas <soby.thoma...@gmail.com>wrote: > Hello Guys, > > I trying to understand how lucene score is calculated. So 'm using the > searcher.explain() function. But the output it gives is really confusing > for > me. Below are the details of the query that I gave and o/p it gave me > > Query: *It is definitely a CES deal that will be over in Sep or Oct of this > year.* > > *output*: > 0.022172567 = (MATCH) product of: > 0.07760398 = (MATCH) sum of: > 0.02287053 = (MATCH) weight(payload:ces in 550), product of: > 0.32539415 = queryWeight(payload:ces), product of: > 2.2491398 = idf(docFreq=157, maxDocs=551) > 0.14467494 = queryNorm > 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: > 1.0 = tf(termFreq(payload:ces)=1) > 2.2491398 = idf(docFreq=157, maxDocs=551) > 0.03125 = fieldNorm(field=payload, doc=550) > 0.05473345 = (MATCH) weight(payload:deal in 550), product of: > 0.23803486 = queryWeight(payload:deal), product of: > 1.6453081 = idf(docFreq=288, maxDocs=551) > 0.14467494 = queryNorm > 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of: > 4.472136 = tf(termFreq(payload:deal)=20) > 1.6453081 = idf(docFreq=288, maxDocs=551) > 0.03125 = fieldNorm(field=payload, doc=550) > 0.2857143 = coord(2/7) > > So can someone please help me to understand the output or suggest any link > that explains this output so that I will be grateful. > > Regards > Soby >