Sebastian Nagel created NUTCH-2705: -------------------------------------- Summary: urlfilter-validator rejects IPv6 URLs Key: NUTCH-2705 URL: https://issues.apache.org/jira/browse/NUTCH-2705 Project: Nutch Issue Type: Bug Components: plugin Affects Versions: 1.15 Reporter: Sebastian Nagel Fix For: 1.16
The plugin urlfilter-validator rejects URLs with an IPv6 address as hostname/authority (given according to [RFC 2732|https://tools.ietf.org/html/rfc2732]: {noformat} % echo "http://[2010:836B:4179::836B:4179]/" \ | bin/nutch filterchecker -filterName urlfilter-validator -stdin Checking combination of these URLFilters: UrlValidator -http://[2010:836B:4179::836B:4179]/ {noformat} We should also consider to use the class [UrlValidator|https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html] from commons-validator directly instead of a modified copy. This would help to get updates and improvements with little effort - IPv6 is already supported, see the [class implementation|https://commons.apache.org/proper/commons-validator/apidocs/src-html/org/apache/commons/validator/routines/UrlValidator.html#line.380]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)