As far as I understand, /robots.txt designates which files may and may not be indexed by the Nutch and other crawlers. However, is there a method by which site may exclude only sections of a document? Some methods I've seen include: <!-- robots content="none" --> <!-- FreeFind Begin No Index --> If there is no such feature and this is deemed useful, I would be willing to implement this feature in code.
I think it could be interested to have such a feature. I don't know if it is really used in online documents, but for an intranet crawling it could be usefull. But since there is no specification about this, you should probably support the most used : * <!-- robots content="none" --> * <noindex> * <!-- googleon ... --> <!-- googleoff ... --> * ..... My 2 cents. Jérôme -- http://motrech.free.fr/ http://www.frutch.org/