On Mon, 2011-04-04 at 07:34 -0400, Chad La Joie wrote: > Yeah, unfortunately that didn't work. > > Is there any way to get the old v3 behavior that gives you access to the > raw bytes of the entity before any sort of character decoding is done? > > I strongly suspect that very few web servers out there are properly > configured to return the correct character encoding so this could > definitely be an ongoing problem. >
EntityUtils.toByteArray returns raw response content without attempting to decode it. http://hc.apache.org/httpcomponents-core-ga/httpcore/xref/org/apache/http/util/EntityUtils.html#81 Oleg > On 4/2/11 6:29 AM, Oleg Kalnichevski wrote: > > On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote: > >> Okay, that makes sense. > >> > >> To test this, is there a way I can force the content type on the client > >> side, prior to requesting the response entity, via the response object? > >> > > > > You can try adding Accept and / or Accept-Charset header to the request > > message and see if the origin server responds appropriately. > > > > However, generally you might be better off using some sort of a content > > detection algorithm such that provided by Apache Tika toolkit. I suspect > > wget does exactly that. > > > > http://tika.apache.org/0.9/detection.html > > http://tika.apache.org/ > > > > Oleg > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
