DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=36932>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=36932 ------- Additional Comments From [EMAIL PROTECTED] 2005-10-05 21:32 ------- We've hit the same thing with our Heritrix crawler, which uses HttpClient. IE and Firefox accept (and often send unaltered on HTTP requests) URIs that are incompletely/inconsistently encoded, so they get used and not fixed, and valuable content that crawlers and other web applications want to access is available to browsers but not HttpClient. Under the same pragmatic philosophy that gives HttpClient its 'compatibility' cookie mode, HttpClient should have a 'lax/loose' URI option, to work better with prevalent real-world deviations from URI specs. Our current workaround, which may not be portable to other projects, includes a 'LaxURI' URI subclass, 'LaxURLCodec', and a change to the HttpMethodBase(String uri) constructor to use LaxURI where it currently uses httpclient's URI. See: http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/net/LaxURI.java?rev=1.2&view=auto http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/archive/net/LaxURLCodec.java?rev=1.2&view=auto http://cvs.sourceforge.net/viewcvs.py/archive-crawler/ArchiveOpenCrawler/src/java/org/apache/commons/httpclient/HttpMethodBase.java?rev=1.10&view=auto -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
