[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899029#comment-13899029 ]
Sebastian Nagel commented on NUTCH-1718: ---------------------------------------- Hi [~tejasp], +1 to "redefine" {{http.robots.agents}} as "additional agent names": makes it simpler for polite users which definitely should use the same user agent name in HTTP header and robots.txt. > update description of property http.robots.agent > ------------------------------------------------ > > Key: NUTCH-1718 > URL: https://issues.apache.org/jira/browse/NUTCH-1718 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.7, 2.2, 2.2.1 > Reporter: Sebastian Nagel > Priority: Trivial > Fix For: 2.3, 1.8 > > Attachments: NUTCH-1718-trunk.v1.patch > > > The description of property http.robots.agent in nutch-default.xml recommends > to add a '*' to the list of agent names. This will cause the same problem as > described in NUTCH-1715. The description should be updated. Also regarding > "order of precedence" which is dictated since NUTCH-1031 only by ordering of > user agents in robots.txt. > {code:xml} > <property> > <name>http.robots.agents</name> > <value>*</value> > <description>The agent strings we'll look for in robots.txt files, > comma-separated, in decreasing order of precedence. You should > put the value of http.agent.name as the first agent name, and keep the > default * at the end of the list. E.g.: BlurflDev,Blurfl,* > </description> > </property> > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)