Zee, > My sysadm refuses to change the robots.txt citing the following reason: > > The moment he allows a specific agent, a lot of crawlers impersonate > as that user agent and tries to crawl that site. > > Are you saying there is no way to configure nutch to ignore robots.txt?
We had a similar situation. We modified the parse-html plugin, with a configurable flag to adhere to robots.txt or not adhere to robots.txt. Works great. JohnM -- john mendenhall j...@surfutopia.net surf utopia internet services