As far as I understand, /robots.txt designates which files may and may
not be indexed by the Nutch and other crawlers. However, is there a
method by which site may exclude only sections of a document?
Some methods I've seen include:
<!-- robots content="none" -->
<!-- FreeFind Begin No Index -->
If there is no such feature and this is deemed useful, I would be
willing to implement this feature in code.

I think it could be interested to have such a feature.
I don't know if it is really used in online documents, but for an intranet
crawling it could be usefull.

But since there is no specification about this, you should probably
support the most used :
* <!-- robots content="none" -->
* <noindex>
* <!-- googleon ... -->  <!-- googleoff ... -->
* .....

My 2 cents.

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to