On Fri, 2011-04-01 at 18:14 -0400, Chad La Joie wrote: > Here you go: > http://shibboleth.net/dumps.tgz > > I found a much smaller document than the one I was initially testing > with. It's off by one byte. > > On 4/1/11 9:38 AM, Oleg Kalnichevski wrote: > > On Fri, 2011-04-01 at 09:06 -0400, Chad La Joie wrote: > >> I'm experiencing an odd problem with HttpClient 4.1.1. I perform a GET > >> on a document and then use the following code to get the bytes of the > >> response entity (assuming a 200 status code): > >> > >> byte[] responseEntity = EntityUtils.toByteArray(response.getEntity()); > >> > >> The problem I'm having is that this returns 16 fewer bytes than are > >> actually in the document. So far I've checked: > >> - that downloading the file via wget gives me the expected byte account > >> - that the downloaded content is not compressed > >> > >> The document itself has a digital signature over it and this is failing > >> to validate with the content as downloaded by HttpClient, but not with > >> the document downloaded by wget so there is some material difference in > >> the canonical form of the document (i.e., it's not just a lack of a new > >> line at the start/end of the document). > >> > >> Any thoughts? Is EntityUtils.toByteArray not the right method to use to > >> get the complete byte[] of the response entity? > >> > >> Thanks. > >
This is a problem with content decoding. << "HTTP/1.1 200 OK[\r][\n]" << "Date: Fri, 01 Apr 2011 14:14:28 GMT[\r][\n]" << "Server: Apache/2.2.3 (Unix) mod_ssl/2.2.3 OpenSSL/0.9.7d[\r][\n]" << "Last-Modified: Thu, 31 Mar 2011 17:00:05 GMT[\r][\n]" << "ETag: "4d0e-39932f40"[\r][\n]" << "Accept-Ranges: bytes[\r][\n]" << "Content-Length: 19726[\r][\n]" << "Keep-Alive: timeout=5, max=99[\r][\n]" << "Connection: Keep-Alive[\r][\n]" << "Content-Type: application/xml[\r][\n]" << "[\r][\n]" << "<?xml version="1.0" encoding="UTF-8"?><!--[\n]" The response content is clearly UTF-8 encoded. However the Content-Type header does not specify a charset. Per HTTP specification if content charset is not explicitly set in the Content-Type content charset is assumed to be ISO-8859-1 Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org For additional commands, e-mail: httpclient-users-h...@hc.apache.org