[ https://issues.apache.org/jira/browse/NUTCH-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-1718: ----------------------------------- Summary: redefine http.robots.agent as "additional agent names" (was: update description of property http.robots.agent) > redefine http.robots.agent as "additional agent names" > ------------------------------------------------------ > > Key: NUTCH-1718 > URL: https://issues.apache.org/jira/browse/NUTCH-1718 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.7, 2.2, 2.2.1 > Reporter: Sebastian Nagel > Priority: Trivial > Fix For: 1.9 > > Attachments: NUTCH-1718-trunk.v1.patch, NUTCH-1718-trunk.v2.patch > > > The description of property http.robots.agent in nutch-default.xml recommends > to add a '*' to the list of agent names. This will cause the same problem as > described in NUTCH-1715. The description should be updated. Also regarding > "order of precedence" which is dictated since NUTCH-1031 only by ordering of > user agents in robots.txt. > {code:xml} > <property> > <name>http.robots.agents</name> > <value>*</value> > <description>The agent strings we'll look for in robots.txt files, > comma-separated, in decreasing order of precedence. You should > put the value of http.agent.name as the first agent name, and keep the > default * at the end of the list. E.g.: BlurflDev,Blurfl,* > </description> > </property> > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)