Hello, I am currently trying crawl the web using nutch 1.11 trunk version from https://github.com/apache/nutch
I am trying to use a particular property from the nutch-default.xml named: http.agent.rotate false If true, instead of http.agent.name, alternating agent names are chosen from a list provided via http.agent.rotate.file. http.agent.rotate.file agents.txt File containing alternative user agent names to be used instead of http.agent.name on a rotating basis if http.agent.rotate is true. Each line of the file should contain exactly one agent specification including name, version, description, URL, etc. This is how I have modified my nutch-site.xml (not including other basic properties) http.agent.rotate true If true, instead of http.agent.name, alternating agent names are chosen from a list provided via http.agent.rotate.file. http.agent.rotate.file agents.txt File containing alternative user agent names to be used instead of http.agent.name on a rotating basis if http.agent.rotate is true. Each line of the file should contain exactly one agent specification including name, version, description, URL, etc. plugin.includes protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. Set parsefilter-naivebayes for classification based focused crawler. This is is how my agents.txt looks like: NutchTry1 NutchTry2 NutchTry3 NutchTry4 NutchTry5 and it is stored inside the runtime/local/conf folder. But when i check my logs, it doesn't seem to change the agent name. Though protocol-http is activated via the plugin.includes property. Could you please suggest what changes I could try or correct something that I may have configured incorrectly. Thanks, Manali