[ 
https://issues.apache.org/jira/browse/NUTCH-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-436:
-------------------------------

    Attachment: NUTCH-436-20070304.patch

NUTCH-436-20070304.patch handles correct encoding of the params information in 
the base url.  When creating a new URL,with a base URL and target String path, 
if the target contains params information but the base does not then the 
java.net.URL class  has the correct behavior.  If the base has params 
information then the URL class strips this information from the URL.  This 
patch is a workaround that moves base params information to the target so that 
it can be correctly handled by the URL class.

> Incorrect handling of relative paths when the embedded URL path is empty
> ------------------------------------------------------------------------
>
>                 Key: NUTCH-436
>                 URL: https://issues.apache.org/jira/browse/NUTCH-436
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>            Reporter: Andrew Groh
>         Assigned To: Dennis Kubes
>            Priority: Critical
>         Attachments: NUTCH-436-20070304.patch
>
>
> If you have a base URL of the form:
> http://a/b/c/d;p?q#f
> Embedded URL: ?y
> Correct Absolute URL: http://a/b/c/d;p?y 
> Nutch Generated URL: http://a/b/c/?y
> Embedded URL: ;x
> Correct Absolute URL: http://a/b/c/d;x 
> Nutch Generated URL: http://a/b/c/;x
> See section 4, steps 5-7 of RFC 1808 for the definition of the correct set of 
> steps, and section 5.1 for example
> http://www.ietf.org/rfc/rfc1808.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to