Hmmm, you might be able to do the following:
Create a document in a memory index containing the web page
Create a query from the keywords
Do a search with the query against the memory index and see the score.
Alternatively, you could use the corpus statistics plus to create a
term vector from the document (as if it were a member of the
collection) and then do the cosine calculation of that document with
your query (which you also calculated the weights for based on your
collections stats)
Last, it sounds like you are essentially describing a categorization
task. Have a look at some categorization software (for instance,
Mahout can do Naive Bayes categorization or some alternatives).
Of course, I might be missing something in understanding what you are
asking, so feel free to give a shout back to discuss.
HTH,
Grant
On Feb 12, 2009, at 1:31 AM, renavatior wrote:
I am doing some research in vertical search? Therefore, i defined some
weights of several keywords in my corpus expressing a certain
theme,later,how can i use these to compute the similarity with the
given web
page(passed by url to the compute method).I saw the source code of
Similarity.java in Lucene,but i do not know how to use the method
such as
TF,IDF,and so on.
i will really appreciate it if anyone can give me some advice,thanks
in
advance.
--
View this message in context:
http://www.nabble.com/How-to-compute-the-simlarity-of-a-web-page--tp21970680p21970680.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org