(moved to nutch-user) House Less wrote:
In retrospect, pardon my stupidity: surely it cannot be right that the term frequency vector for a page is not present within Nutch, for it needs this to compute the score for a page given a query. I would appreciate it if you would tell me where I may find it given a document number. Thank you.
This is not a silly question. Indeed, Lucene uses term frequency vector model when computing scores, but it doesn't necessarily mean that term frequency vector _per_ _document_ is explicitly stored ... and in fact Nutch does not store this data by default. You would have to modify the indexing plugins to add this information, and then extend the Nutch API to be able to retrieve this via NutchBean.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
