Hi, Can somebody explain the lengthNorm, queryNorm and coord in lucene? lengthNorm is the (term freq)/(total terms number) or (term freq)/(max term freq) or something else. queryNorm is the (term squared weight)/(sumOfSqureWeights)? Why we still need queryNorm when it will not affect the score for a certain query? How to calculate the coord value? Thanks.
ZZ -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 07 July 2006 06:10 To: java-user@lucene.apache.org Subject: Re: Lucene search formula The formula hasn't changed (but the first printing of the book had a portion of it missing, check javadoc for (Default?)Similarity for the real and current formula). Here is a simple IDF example, or at least how I "visualize" IDF. You have an index with a bunch of documents and terms in it. A term T can appear some number of times in this index, say N times. You can think of the IDF of the term T is "1/N" (not really 1/N, but.... log(numDocs/(docFreq+1)) + 1). The more frequent the term in the index, the smaller its weight (the less important it is during scoring). Otis ----- Original Message ---- From: Rajiv Roopan <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, July 6, 2006 10:46:52 PM Subject: Lucene search formula Hello, I was recently looking thru the lucene in action book and came across the scoring formula. I was wondering if the formula has changed since the book was written? Also was wondering if someone can breifly explain what the IDF(t) term in the formula means? In the book it says that it's the inverse document frequency of the term but doesn't explain beyond that? thanks, rajiv --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]