DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36932>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=36932





------- Additional Comments From [EMAIL PROTECTED]  2005-10-05 21:32 -------
We've hit the same thing with our Heritrix crawler, which uses HttpClient. IE
and Firefox accept (and often send unaltered on HTTP requests) URIs that are
incompletely/inconsistently encoded, so they get used and not fixed, and
valuable content that crawlers and other web applications want to access is
available to browsers but not HttpClient.

Under the same pragmatic philosophy that gives HttpClient its 'compatibility'
cookie mode, HttpClient should have a 'lax/loose' URI option, to work better
with prevalent real-world deviations from URI specs. 

Our current workaround, which may not be portable to other projects, includes a
'LaxURI' URI subclass, 'LaxURLCodec', and a change to the HttpMethodBase(String
uri) constructor to use LaxURI where it currently uses httpclient's URI. 

See:
 
http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/net/LaxURI.java?rev=1.2&view=auto
 
http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/net/LaxURLCodec.java?rev=1.2&view=auto
 
http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/apache/commons/httpclient/HttpMethodBase.java?rev=1.10&view=auto

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to