Hi Rebecca, Thanks for your valuble comments. Yes I observed tha, once the number of terms of the goes up, fieldNorm value goes down correspondingly. I think, therefore there won't be any default due to the variation of total number of terms in the document. Am I right?
Manjula. On Thu, Jul 8, 2010 at 9:34 AM, Rebecca Watson <bec.wat...@gmail.com> wrote: > hi, > > > 1) Although Lucene uses tf to calculate scoring it seems to me that term > > frequency has not been normalized. Even if I index several documents, it > > does not normalize tf value. Therefore, since the total number of words > > in index documents are varied, can't there be a fault in Lucene's > scoring? > > tf = term frequency i.e. the number of times the term appears in the > document, > while idf is inverse document frequency - is a measure of how rare a term > is, > i.e. related to how many documents the term appears in. > > if term1 occurs more frequently in a document i.e. tf is higher, you > want to weight > the document higher when you search for term1 > > but if term1 is a very frequent term, ie. in lots of documents, then > its probably not > as important to an overall search (where we have term1, term2 etc) so you > want > to downweight it (idf comes in) > > then the normalisations like length normalisation (allow for 'fair' scoring > across varied field length) come in too. > > the tf-idf scoring formula used by lucene is a scoring method that's > been around > a long long time... there are competing scoring metrics but that's an IR > thing > and not an argument you want to start on the lucene lists! :) > > these are IR ('information retrieval') concepts and you might want to start > by > going to through the tf-idf scoring / some explanations for this kind > of scoring. > > http://en.wikipedia.org/wiki/Tf%E2%80%93idf > http://wiki.apache.org/lucene-java/InformationRetrieval > > > > 2) What is the formula to calculate this fieldNorm value? > > in terms of how lucene implements its tf-idf scoring - you can see here: > http://lucene.apache.org/java/3_0_2/scoring.html > > also, the lucene in action book is a really good book if you are starting > out > with lucene (and will save you a lot of grief with understanding > lucene / setting > up your application!), it covers all the basics and then moves on to more > advanced stuff and has lots of code examples too: > http://www.manning.com/hatcher2/ > > hope that helps, > > bec :) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >