[
https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark DeSpain updated NUTCH-620:
-------------------------------
Attachment: patch.txt
Here is a patch with updated BasicURLNormalizer such that it will collapse
adjacent slashes. It also updates the corresponding unit test.
> BasicURLNormalizer should collapse runs of slashes with a single slash
> ----------------------------------------------------------------------
>
> Key: NUTCH-620
> URL: https://issues.apache.org/jira/browse/NUTCH-620
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.9.0
> Environment: JDK 1.6 update 5, Tomcat 6, Windows Server 2003,
> Reporter: Mark DeSpain
> Priority: Minor
> Fix For: 1.0.0
>
> Attachments: patch.txt
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The BasicURLNormalizer should collapse runs of slash characters '/' with a
> single slash.
> For example, the following URLs should be normalized to
> http://lucene.apache.org/nutch/about.html
> * http://lucene.apache.org/nutch//about.html
> * http://lucene.apache.org//nutch/about.html
> * http://lucene.apache.org/////nutch////about.html (an exaggerated example)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.