[
https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579707#action_12579707
]
Mark DeSpain commented on NUTCH-620:
------------------------------------
Sure :) I'm a bit swamped at the moment, but I'll try to get a patch attached
this coming weekend. I'll see if I can drum up some relevant HTML source, too.
> BasicURLNormalizer should collapse runs of slashes with a single slash
> ----------------------------------------------------------------------
>
> Key: NUTCH-620
> URL: https://issues.apache.org/jira/browse/NUTCH-620
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.9.0
> Environment: JDK 1.6 update 5, Tomcat 6, Windows Server 2003,
> Reporter: Mark DeSpain
> Priority: Minor
> Fix For: 1.0.0
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The BasicURLNormalizer should collapse runs of slash characters '/' with a
> single slash.
> For example, the following URLs should be normalized to
> http://lucene.apache.org/nutch/about.html
> * http://lucene.apache.org/nutch//about.html
> * http://lucene.apache.org//nutch/about.html
> * http://lucene.apache.org/////nutch////about.html (an exaggerated example)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.