MoreLikeThis and TermVector relationship

Saurabh Gokhale Mon, 24 Oct 2011 21:24:21 -0700

Hi,

In my project, my intention is to show similar documents to the user based
on the documents searched by the user.


*As per Lucid Solr reference guide...*
For best results, use stored TermVectors in the schema.xml for fields
specified for similarity. For example: <field name="cat" ...
termVectors="true" />
If termVectors are not stored, MoreLikeThis will generate terms from stored
fields

Now since I am using lucene and not Solr, I will ask question from Lucene
point of view:

1. What is the difference between the below 2 index statements. As per my
understanding first one does not store separate TermVector and second does.

new Field("title", data.getTitle() , Field.Store.NO <http://field.store.no/>,
Field.Index.ANALYZED)
new Field("title", data.getTitle() , Field.Store.NO <http://field.store.no/>,
Field.Index.ANALYZED, Field.TermVector.YES)

So if that is the case, how will it impact MoreLikeThis Searching?

2. Also how much difference does it make in the match results when I enable
TermVectors and when i dont?
I found 2 interesting things:
A. Lucene index size got almost tripled (for my data) when I enable
TermVectors.
B. When I used MoreLikeThis on the index which had term Vector and on the
index which did not specifically had TermVector enabled, both morelikethis
results were exactly same, so what is the advantage of TermVector?

Thanks

Saurabh

MoreLikeThis and TermVector relationship

Reply via email to