thanks Jayendra...it was really helpful On Sat, Aug 7, 2010 at 6:07 PM, jayendra patil <jayendra.pa...@gmail.com>wrote:
> Trying to put up an explanation :- > > 0.022172567 = (MATCH) product of: > 0.07760398 = (MATCH) sum of: > 0.02287053 = (MATCH) weight(payload:ces in 550), product of: > 0.32539415 = queryWeight(payload:ces), product of: > 2.2491398 = *idf*(docFreq=157, maxDocs=551) > 0.14467494 = queryNorm > 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: > 1.0 = *tf(*termFreq(payload:ces)=1) > 2.2491398 = *idf(*docFreq=157, maxDocs=551) > 0.03125 = *fieldNorm*(field=payload, doc=550) > 0.05473345 = (MATCH) weight(payload:deal in 550), product of: > 0.23803486 = queryWeight(payload:deal), product of: > 1.6453081 = *idf(*docFreq=288, maxDocs=551) > 0.14467494 = *queryNorm* > 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of: > 4.472136 = tf(termFreq(payload:deal)=20) > 1.6453081 = idf(docFreq=288, maxDocs=551) > 0.03125 = fieldNorm(field=payload, doc=550) > 0.2857143 = coord(2/7) > > > 1. tf = term frequency in document = measure of how often a term appears > in the document > 1. > > Implementation: sqrt(freq) > > Implication: the more frequent a term occurs in a document, the > greater its score > > Rationale: documents which contains more of a term are generally more > relevant > 2. idf = inverse document frequency = measure of how often the term > appears across the index > 1. > > Implementation: log(numDocs/(docFreq+1)) + 1 > > Implication: the greater the occurrence of a term in different > documents, the lower its score > > Rationale: common terms are less important than uncommon ones > 3. coord = number of terms in the query that were found in the > document > 1. > > Implementation: overlap / maxOverlap > > Implication: of the terms in the query, a document that contains more > terms will have a higher score > > Rationale: self-explanatory > 4. fieldNorm > 1. lengthNorm = measure of the importance of a term according to the > total number of terms in the field > 1. Implementation: 1/sqrt(numTerms) > 2. Implication: a term matched in fields with less terms have a > higher score > 3. Rationale: a term in a field with less terms is more important > than one with more > 2. boost (index) = boost of the field at index-time > 1. Index time boost specified. The fieldNorm value in the score > would include the same. > 3. boost (query) = boost of the field at query-time > 5. queryNorm = normalization factor so that queries can be compared > 1. queryNorm is not related to the relevance of the document, but > rather tries to make scores between different queries comparable. It > is > implemented as 1/sqrt(sumOfSquaredWeights) > > > When you are trying to search for Query: *It is definitely a CES deal that > will be over in Sep or Oct of this year.* > > 1. Lucene would try to match each word in our query in each field that you > have specified to be searched on e.g. payload in your case. > 2. In your example, it found match only on ces and deal, hence only the two > items are displayed. > 3. The number of matches in the particular field also contributes to > the 0.2857143 = coord(*2*/7) - 2 words out of 7 > 4. *idf*(docFreq=157, maxDocs=551) - specified the rarity. The docfreq > specifies the number of documents which have the word in the field with the > maxdocs represents the total number of documents. > 5. *tf(*termFreq(payload:ces)=1) - Specifies the number of times it occurs > e.g. 1 in this case. > 6. The Score for each field match is the product of the > > 0.02287053 = (MATCH) weight(payload:ces in 550), product of: > > Field boost and idf > > 0.32539415 = queryWeight(payload:ces), product of: > > * 1 = boost (**The boost if your case seems to be 1 and hence is not > included in the score.**)* > > 2.2491398 = idf(docFreq=157, maxDocs=551) > > 0.14467494 = queryNorm > > term frequency, idf and field norm > > 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: > > 1.0 = *tf(*termFreq(payload:ces)=1) > > 2.2491398 = *idf(*docFreq=157, maxDocs=551) > > 0.03125 = *fieldNorm*(field=payload, doc=550) > > > > Regards, > Jayendra > > On Sat, Aug 7, 2010 at 11:02 AM, Soby Thomas <soby.thoma...@gmail.com > >wrote: > > > Hello Guys, > > > > I trying to understand how lucene score is calculated. So 'm using the > > searcher.explain() function. But the output it gives is really confusing > > for > > me. Below are the details of the query that I gave and o/p it gave me > > > > Query: *It is definitely a CES deal that will be over in Sep or Oct of > this > > year.* > > > > *output*: > > 0.022172567 = (MATCH) product of: > > 0.07760398 = (MATCH) sum of: > > 0.02287053 = (MATCH) weight(payload:ces in 550), product of: > > 0.32539415 = queryWeight(payload:ces), product of: > > 2.2491398 = idf(docFreq=157, maxDocs=551) > > 0.14467494 = queryNorm > > 0.07028562 = (MATCH) fieldWeight(payload:ces in 550), product of: > > 1.0 = tf(termFreq(payload:ces)=1) > > 2.2491398 = idf(docFreq=157, maxDocs=551) > > 0.03125 = fieldNorm(field=payload, doc=550) > > 0.05473345 = (MATCH) weight(payload:deal in 550), product of: > > 0.23803486 = queryWeight(payload:deal), product of: > > 1.6453081 = idf(docFreq=288, maxDocs=551) > > 0.14467494 = queryNorm > > 0.2299388 = (MATCH) fieldWeight(payload:deal in 550), product of: > > 4.472136 = tf(termFreq(payload:deal)=20) > > 1.6453081 = idf(docFreq=288, maxDocs=551) > > 0.03125 = fieldNorm(field=payload, doc=550) > > 0.2857143 = coord(2/7) > > > > So can someone please help me to understand the output or suggest any > link > > that explains this output so that I will be grateful. > > > > Regards > > Soby > > >