hi, > 1) Although Lucene uses tf to calculate scoring it seems to me that term > frequency has not been normalized. Even if I index several documents, it > does not normalize tf value. Therefore, since the total number of words > in index documents are varied, can't there be a fault in Lucene's scoring?
tf = term frequency i.e. the number of times the term appears in the document, while idf is inverse document frequency - is a measure of how rare a term is, i.e. related to how many documents the term appears in. if term1 occurs more frequently in a document i.e. tf is higher, you want to weight the document higher when you search for term1 but if term1 is a very frequent term, ie. in lots of documents, then its probably not as important to an overall search (where we have term1, term2 etc) so you want to downweight it (idf comes in) then the normalisations like length normalisation (allow for 'fair' scoring across varied field length) come in too. the tf-idf scoring formula used by lucene is a scoring method that's been around a long long time... there are competing scoring metrics but that's an IR thing and not an argument you want to start on the lucene lists! :) these are IR ('information retrieval') concepts and you might want to start by going to through the tf-idf scoring / some explanations for this kind of scoring. http://en.wikipedia.org/wiki/Tf%E2%80%93idf http://wiki.apache.org/lucene-java/InformationRetrieval > 2) What is the formula to calculate this fieldNorm value? in terms of how lucene implements its tf-idf scoring - you can see here: http://lucene.apache.org/java/3_0_2/scoring.html also, the lucene in action book is a really good book if you are starting out with lucene (and will save you a lot of grief with understanding lucene / setting up your application!), it covers all the basics and then moves on to more advanced stuff and has lots of code examples too: http://www.manning.com/hatcher2/ hope that helps, bec :) --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org