(moved to nutch-user)

House Less wrote:
In retrospect, pardon my stupidity: surely it cannot be right that
the term frequency vector for a page is not present within Nutch, for
it needs this to compute the score for a page given a query. I would
appreciate it if you would tell me where I may find it given a
document number. Thank you.

This is not a silly question. Indeed, Lucene uses term frequency vector model when computing scores, but it doesn't necessarily mean that term frequency vector _per_ _document_ is explicitly stored ... and in fact Nutch does not store this data by default. You would have to modify the indexing plugins to add this information, and then extend the Nutch API to be able to retrieve this via NutchBean.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to