dealmaker wrote:
Hi,
  Does Nutch or any plugin have the template detection?  It seems that
navigation and footer sections usually distort the ranking of search
results.  Is there already open source project or code that I can integrate
to Nutch to give it the ability of template detection?
Thanks.

There is no ready-made component in Nutch for this task. The task itself is complicated and there are no ideal solutions. There are several algorithms described in the literature, primarily falling into two groups: page-at-a-time (usually single pass) and whole-corpus (usually several passes). They work with varying degrees of success, strongly dependent on the test corpus.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to