Julien Nioche created NUTCH-1990: ------------------------------------ Summary: Use URI.normalise() in BasicURLNormalizer Key: NUTCH-1990 URL: https://issues.apache.org/jira/browse/NUTCH-1990 Project: Nutch Issue Type: Improvement Affects Versions: 1.9 Reporter: Julien Nioche Assignee: Julien Nioche
One of the things that [BasicURLNormalizer|https://github.com/apache/nutch/blob/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java] is to remove unnecessary dot segments in path. Instead of implementing the logic ourselves with some antiquated regex library, we should simply use [http://docs.oracle.com/javase/7/docs/api/java/net/URI.html#normalize()] which does the same and is probably more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)