Hello,

As far as I understand, /robots.txt designates which files may and may
not be indexed by the Nutch and other crawlers. However, is there a
method by which site may exclude only sections of a document?

The benefit is most evident in the search hit result description
(snippets) which will often contain navigation links that may not give
useful information about a page. As far as I know, there is no
standard. Does Nutch provide a method for document section exclusion?
Some methods I've seen include:



<!-- robots content="none" -->

not to be indexed

<!-- /robots -->



<!-- FreeFind Begin No Index -->

not to be indexed

<!-- FreeFind End No Index -->


If there is no such feature and this is deemed useful, I would be
willing to implement this feature in code.

Alex
--
CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1

Reply via email to