Then I guess I misunderstood what you were saying before. Are you suggesting then that the server is transcoding the file when it serves it up? And that the missing bytes actually go missing before HttpClient gets the response?
On 4/4/11 8:08 AM, Oleg Kalnichevski wrote: > On Mon, 2011-04-04 at 07:34 -0400, Chad La Joie wrote: >> Yeah, unfortunately that didn't work. >> >> Is there any way to get the old v3 behavior that gives you access to the >> raw bytes of the entity before any sort of character decoding is done? >> >> I strongly suspect that very few web servers out there are properly >> configured to return the correct character encoding so this could >> definitely be an ongoing problem. >> > > EntityUtils.toByteArray returns raw response content without attempting > to decode it. > > http://hc.apache.org/httpcomponents-core-ga/httpcore/xref/org/apache/http/util/EntityUtils.html#81 > > Oleg > > >> On 4/2/11 6:29 AM, Oleg Kalnichevski wrote: >>> On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote: >>>> Okay, that makes sense. >>>> >>>> To test this, is there a way I can force the content type on the client >>>> side, prior to requesting the response entity, via the response object? >>>> >>> >>> You can try adding Accept and / or Accept-Charset header to the request >>> message and see if the origin server responds appropriately. >>> >>> However, generally you might be better off using some sort of a content >>> detection algorithm such that provided by Apache Tika toolkit. I suspect >>> wget does exactly that. >>> >>> http://tika.apache.org/0.9/detection.html >>> http://tika.apache.org/ >>> >>> Oleg >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Chad La Joie http://itumi.biz trusted identities, delivered --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
