EntityUtils.toString should detect Byte order mark (BOM) and remove it if
present
---------------------------------------------------------------------------------
Key: HTTPCLIENT-1149
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1149
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient
Affects Versions: 4.1.2
Environment: Windows
Reporter: Ian Beaumont
Priority: Minor
The Byte order mark at the start of the input stream should be detected and
removed by EntityUtils.toString, otherwise strange unwanted characters are
left at the start.
This link lists possible Byte order markings
http://en.wikipedia.org/wiki/Byte_order_mark
I'm not sure if EntityUtils.toString using the BOM to try to detect the
encoding, but if it doesn't then it should.
Example URL that is causing this issue is mircosoft virtual earth WSDL file:
HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new
HttpGet("http://dev.virtualearth.net/webservices/v1/searchservice/searchservice.svc?wsdl");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
String textContents = EntityUtils.toString(entity);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]