[
https://issues.apache.org/jira/browse/HTTPCLIENT-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613996#action_12613996
]
oglueck edited comment on HTTPCLIENT-787 at 7/16/08 8:54 AM:
------------------------------------------------------------------
This is a server bug, not a client issue. And you have heard it before
"HttpClient is not a browser". Also as mentioned dozends of times before, URLs
can not be escaped once they are represented as a series of bytes. Yes, in this
particular case it is sort of possible. But please consider:
a) we can't know the right encoding for the URL. UTF-8 is only a
recommendation. So assuming an ASCII-compatible encoding is arbitrary.
b) This is correct for spaces in URI components (like path), but it is wrong
for spaces in application/x-www-form-urlencoded values of HTML forms (query
string): they use a plus sign + to escape spaces.
And we have no reason to assume that the query string uses
x-www-form-urlencoded. It could just use anything.
As specified here:
http://www.w3.org/TR/html401/interact/forms.html#form-content-type
see also: http://marc.info/?l=httpclient-commons-dev&m=116859139319469&w=2
Sample: http://people.apache.org/~oglueck/composite path/servlet?composite
name=composite value
is correctly escaped like so:
http://people.apache.org/~oglueck/composite%20path/servlet?composite+name=composite+value
c) you can implement your own redirect handler that can handle all sort of
malformed responses
To be fair, for *most* servers out there this shouldn't be a problem, because:
a) they expect URI encodings to be UTF-8
b) they all have "compatible" (broken) parsers that allow + and %20 to be used
interchangibly
However, the really relevant point is: if the server does not even care to
escape the space character, it will most like not escape any other non-URI
characters. Most likely because of some careless programming. Such a server or
application grossly violates the HTTP protocol and should be considered broken.
I would like to mark this issue as invalid.
Maybe a good thing to have would be a "CompatibilityRedirectHandler" that
immitates the convenient behaviour of popular browsers. Consider contributing
one.
was (Author: oglueck):
This is a server bug, not a client issue. And you have heard it before
"HttpClient is not a browser". Also as mentioned dozends of times before, URLs
can not be escaped once they are represented as a series of bytes. Yes, in this
particular case it is sort of possible. But please consider:
a) we can't know the right encoding for the URL. UTF-8 is only a
recommendation. So assuming an ASCII-compatible encoding is arbitrary.
b) This is correct for spaces in URI components (like path), but it is wrong
for spaces in application/x-www-form-urlencoded values of HTML forms (query
string): they use a plus sign + to escape spaces.
And we have no reason to assume that the query string uses
x-www-form-urlencoded. It could just use anything.
As specified here:
http://www.w3.org/TR/html401/interact/forms.html#form-content-type
see also: http://marc.info/?l=httpclient-commons-dev&m=116859139319469&w=2
Sample: http://people.apache.org/~oglueck/composite path/servlet?composite
name=composite value
is correctly escaped like so:
http://people.apache.org/~oglueck/composite%20path/servlet?composite+name=composite+value
To be fair, for *most* servers out there this shouldn't be a problem, because:
a) they expect URI encodings to be UTF-8
b) they all have "compatible" (broken) parsers that allow + and %20 to be used
interchangibly
c) you can implement your own redirect handler that can handle all sort of
malformed responses
However, the really relevant point is: if the server does not even care to
escape the space character, it will most like not escape any other non-URI
characters. Most likely because of some careless programming. Such a server or
application grossly violates the HTTP protocol and should be considered broken.
I would like to mark this issue as invalid.
Maybe a good thing to have would be a "CompatibilityRedirectHandler" that
immitates the convenient behaviour of popular browsers. Consider contributing
one.
> Redirects with spaces in them are not handled correctly
> -------------------------------------------------------
>
> Key: HTTPCLIENT-787
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-787
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Reporter: Dave Clemmer
> Priority: Minor
>
> If a redirect address has spaces in it (yes, I know, the person creating that
> situation should be beaten, but, alas, that is not an option), they are not
> converted to %20 before opening, and, hence, fail to open.
> changing line 107 of DefaultRedirectHandler to
> String location = locationHeader.getValue().replaceAll (" ", "%20");
> seems to fix it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]