---- For original character sequences that contain non-ASCII characters, however, the situation is more difficult. Internet protocols that transmit octet sequences intended to represent character sequences are expected to provide some way of identifying the charset used, if there might be more than one [RFC2277]. However, there is currently no provision within the generic URI syntax to accomplish this identification. An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.
It is expected that a systematic treatment of character encoding within URI will be developed as a future modification of this specification. -----
So there's no right answer here. The IETF seems to be moving towards using UTF-8 as the international charset so we may as well use it. I have been unable to find a browser that can correctly handle anything outside of ISO8859-1 charset however - double byte characters are a really great way to screw things up.
So in essence - don't put non-ASCII characters in URLs there is no official way to support them. We should however give it a shot by using UTF-8 since it is "compatible" with ASCII anyway.
Regards,
Adrian Sutton.
On Friday, July 11, 2003, at 03:11 AM, Oleg Kalnichevski wrote:
This is one of many 'shady' areas of the HTTP spec. Basically there is no standard way for the client to communicate to the server what coding has been used to decode query parameters. I believe some browsers use 'Accept-charset" or 'Accept-Language' headers to negotiate the locale settings to be used by the server. But I am not sure it these headers can be used to determine what character coding can be used to decode URL-encoded data.
I think we definitely should not be using US-ASCII per default. The
whole point of URL encoding is to escape non-ASCII characters. I suggest
UTF-8 be used per default.
Oleg
On Thu, 2003-07-10 at 17:48, Michael Becke wrote:Hello Martin,
This is a good question, one that I am not positive I know the answer to. The HTTP request line (containing the query params) must be US-ASCII. That I am sure of. The catch is that form urlencoding strings makes them ASCII, regardless of the original charset. So HttpMethod.setQueryString(NameValuePair[]) is assuming that the inputs(query params) are ASCII when really only the output(encoded params) should be ASCII.
The question is how does one determine, on the client and the server,
what the charset of the query params is? The request charset can be
specified with the Content-Type header, but this is meant to apply to
the request entity, not the headers. I have a feeling that we should
probably be using the content charset anyway. My reasoning here is that
an HTML form can be sent via a GET(query params) or POST(post content).
In both cases the content must be form urlencoded and my feeling is
that it should be done the same for both.
What does everyone else think?
Mike
Martin Schnyder wrote:When I use the GetMethod class to send text with special characters (German
Umlaute "äöü") in the request parameters, the special characters are not
encoded correctly. This happens when I use method
HttpMethodBase.setQueryString(NameValuePair[] params)
to set the query parameters.
I saw that Release 2.0 Beta 2 fixed that with bug fix 20481. Special
characters are now encoded differently but still wrong, as far as I can see.
Method HttpMethodBase.setQueryString(NameValuePair[]) calls
formUrlEncode(params, HttpConstants.HTTP_ELEMENT_CHARSET) to encode the
parameters. The value of HTTP_ELEMENT_CHARSET is US-ASCII. When I change the
charset to HttpConstants.DEFAULT_CONTENT_CHARSET (which is ISO-8859-1), the
German "Umlaute" are encoded correctly. I checked that with the code in CVS
HEAD. Is this a bug or should really only the US-ASCII characters be
supported in a request URI?
Regards, Martin Schnyder
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]