I agree with you. But I do find a lot of servers act in that way. Some
Location headers are in GBK, and some are in UTF8. The only thing I can do
is to hack in the code.

I think httpclient should provide a mechanism to handle it. Httpclient has a
dummy detectror to detect the charset of url, which always return
"US-ASCII". But it allows user to override it.

Feng

On 6/11/07, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:

On Mon, 2007-06-11 at 17:27 +0800, Feng Jiang wrote:
> Hi all,
>
> I think the implementation of HttpMethodParams#getHttpElementCharset()
has a
> problem. In default, httpclient will choose US-ASCII as the charset to
> decode the http element, such as some headers.
>
> But I do meet some servers from which the LOCATION header is in some
other
> charset, such UTF8, so that the httpclient  cannot handles the
> redirection(in my application, i handle it by myself) correctly. For
> example, one server response such  a header:
>
> Location: http://www.abc.com/****(some chinese character)/hello/world
>
> The above url contains some Chinese characters in some other charset,
such
> as GBK. The right way of httpclient should be: 1. detect the charset of
the
> url. 2. decode the url in that correct charset to a java.lang.String. 3.
> construct correct header instance.
>
> Am I right?
>

Not really. The use of non-ASCII characters in HTTP head elements (such
as headers or a request line) is a violation of the HTTP specification.
You can explicitly override the standard charset with a non-standard one
such as UTF-8 or GBK by setting the 'http.protocol.element-charset'
parameter, but I do not think HttpClient should attempt to 'guess' the
charset being used.

For details see:

http://jakarta.apache.org/commons/httpclient/charencodings.html

Oleg

> Feng


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to