Hello, As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document?
The benefit is most evident in the search hit result description (snippets) which will often contain navigation links that may not give useful information about a page. As far as I know, there is no standard. Does Nutch provide a method for document section exclusion? Some methods I've seen include: <!-- robots content="none" --> not to be indexed <!-- /robots --> <!-- FreeFind Begin No Index --> not to be indexed <!-- FreeFind End No Index --> If there is no such feature and this is deemed useful, I would be willing to implement this feature in code. Alex -- CCC7 D19D D107 F079 2F3D BF97 8443 DB5A 6DB8 9CE1