[ https://issues.apache.org/jira/browse/NUTCH-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann reassigned NUTCH-1995: ---------------------------------------- Assignee: Chris A. Mattmann > Add support for wildcard to http.robot.rules.whitelist > ------------------------------------------------------ > > Key: NUTCH-1995 > URL: https://issues.apache.org/jira/browse/NUTCH-1995 > Project: Nutch > Issue Type: Improvement > Components: robots > Affects Versions: 1.10 > Reporter: Giuseppe Totaro > Assignee: Chris A. Mattmann > Labels: memex > Fix For: 1.11 > > > The {{http.robot.rules.whitelist}} > ([NUTCH-1927|https://issues.apache.org/jira/browse/NUTCH-1927]) configuration > parameter allows to specify a comma separated list of hostnames or IP > addresses to ignore robot rules parsing for. > Adding support for wildcard in {{http.robot.rules.whitelist}} could be very > useful and simplify the configuration, for example, if we need to give many > hostnames/addresses. Here is an example: > {noformat} > <name>http.robot.rules.whitelist</name> > <value>*.sample.com</value> > <description>Comma separated list of hostnames or IP addresses to ignore > robot rules parsing for. Use with care and only if you are explicitly > allowed by the site owner to ignore the site's robots.txt! > </description> > </property> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)