[
https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495384#comment-13495384
]
Oleg Kalnichevski commented on HTTPCLIENT-1257:
-----------------------------------------------
One can force HttpClient to ignore malformed and unmappable characters using
"http.malformed.input.action" and "http.unmappable.input.action" parameters
http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/params/CoreProtocolPNames.html#HTTP_MALFORMED_INPUT_ACTION
Header is an interface, so you can have a custom implementation of it backed by
a byte array instead of CharArrayBuffer used internally by HttpClient.
Oleg
> Header location automatically converted to ASCII even though location can
> contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
> Key: HTTPCLIENT-1257
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 4.2.2
> Reporter: Thibaut
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]"
> [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57
> GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981
> 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store,
> no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0;
> expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location:
> http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]"
> [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]"
> [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving
> response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012
> 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981
> 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store,
> no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0;
> expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location:
> http://handheld.vn/content/4052-Äánh-giá-máy-tÃnh-bảng-Kindle-Fire-HD-7-inch
> [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return
> http://handheld.vn/content/4052-Äánh-giá-máy-tÃnh-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will
> never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and
> redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the
> encoding on my own? This would help a lot. Or a parameter to optionally
> specify the encoding when fetching a header value.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]