Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of similarity formula, but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in Query and Doc, so it's a modified implementation of VSM in practice. Do you just want to verify which implementation of VSM in "ieee-sw-rank" is more precise in practice by lucene? If so, it's an useful experiment.
2008/2/27, Dharmalingam <[EMAIL PROTECTED]>: > > > Hi List, > > I am pretty new to Lucene. Certainly, it is very exciting. I need to > implement a new Similarity class based on the Term Vector Space Model > given > in http://www.miislita.com/term-vector/term-vector-3.html > > Although that model is similar to Lucene's model > ( > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html > ), > I am having hard time to extend the Similarity class to calculate that > model. > > In that model, "tf" is multiplied with Idf for all terms in the index, but > in Lucene "tf" is calculated only for terms in the given Query. Because of > that effect, the norm calculation should also include "idf" for all terms. > Lucene calculates the norm, during indexing, by "just" counting the number > of terms per document. In the web formula (in miislita.com), a document > norm > is calculated after multiplying "tf" and "idf". > > FYI: I could implement "idf" according to miisliat.com formula, but not > the > "tf" and "norm" > > Could you please comment me how I can implement a new Similarity class > that > will fit in the Lucene's architecture, but still implement the vector > space > model given in miislita.com > > Thanks a lot for your comments, > > Dharma > > > -- > View this message in context: > http://www.nabble.com/Vector-Space-Model%3A-New-Similarity-Implementation-Issues-tp15696719p15696719.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >