Not sure I am understanding what you are asking, but I will give it a shot. See below

On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote:


Hi List,

I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space Model given
in http://www.miislita.com/term-vector/term-vector-3.html

Although that model is similar to Lucene’s model
(http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html ),
I am having hard time to extend the Similarity class to calculate that
model.

In that model, “tf” is multiplied with Idf for all terms in the index, but in Lucene “tf” is calculated only for terms in the given Query. Because of that effect, the norm calculation should also include “idf” for all terms. Lucene calculates the norm, during indexing, by “just” counting the number of terms per document. In the web formula (in miislita.com), a document norm
is calculated after multiplying “tf” and “idf”.

Are you wondering if there is a way to score all documents regardless of whether the document has the term or not? I don't quite get your statement: "In that model, “tf” is multiplied with Idf for all terms in the index, but in Lucene “tf” is calculated only for terms in the given Query."

Isn't the result for those documents that don't have query terms just going to be 0 or am I not fully understanding? I briefly skimmed the paper you cite and it doesn't seem that different, it's just describing the Salton's VSM right?



FYI: I could implement “idf” according to miisliat.com formula, but not the
“tf” and “norm”

Could you please comment me how I can implement a new Similarity class that will fit in the Lucene’s architecture, but still implement the vector space
model given in miislita.com

In the end, you may need to implement some lower level Query classes, but I still don't fully understand what you are trying to do, so I wouldn't head down that path just yet.

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to