URI Absolutization does not follow browser behavior
---------------------------------------------------
Key: HTTPCLIENT-679
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-679
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient
Affects Versions: 3.1 RC1
Environment: HttpClient 3.1 RC1,
JDK 1.6.0
Ubuntu 7.04
Reporter: Jeff Dalton
This was encountered using Heritrix to crawl a prominent website.
The URI resulting from the HttpClient URI constructor (base, relative) does not
follow browser behavior:
URI newUrl = new URI(new
URI("http://www.theirwebsite.com/browse/results?type=browse&att=1"),
"?sort=0&offset=11&pageSize=10")
Results in newUrl:
http://www.theirwebsite.com/browse/?sort=0&offset=11&pageSize=10
The desired behavior based on Firefox and IE should be:
http://www.theirwebsite.com/browse/results?sort=0&offset=11&pageSize=10
These browsers treat the question mark similar to a directory separator and do
not require a file to be specified before the query.
HttpClient's current behavior does not correspond to current browser behavior
and leads to an inability to crawl certain websites if HttpClient's URI class
is used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]