On Mon, 2011-04-04 at 07:34 -0400, Chad La Joie wrote:
> Yeah, unfortunately that didn't work.
> 
> Is there any way to get the old v3 behavior that gives you access to the
> raw bytes of the entity before any sort of character decoding is done?
> 
> I strongly suspect that very few web servers out there are properly
> configured to return the correct character encoding so this could
> definitely be an ongoing problem.
> 

EntityUtils.toByteArray returns raw response content without attempting
to decode it. 

http://hc.apache.org/httpcomponents-core-ga/httpcore/xref/org/apache/http/util/EntityUtils.html#81

Oleg


> On 4/2/11 6:29 AM, Oleg Kalnichevski wrote:
> > On Sat, 2011-04-02 at 06:10 -0400, Chad La Joie wrote:
> >> Okay, that makes sense.
> >>
> >> To test this, is there a way I can force the content type on the client
> >> side, prior to requesting the response entity, via the response object?
> >>
> > 
> > You can try adding Accept and / or Accept-Charset header to the request
> > message and see if the origin server responds appropriately.
> > 
> > However, generally you might be better off using some sort of a content
> > detection algorithm such that provided by Apache Tika toolkit. I suspect
> > wget does exactly that.
> > 
> > http://tika.apache.org/0.9/detection.html
> > http://tika.apache.org/
> > 
> > Oleg
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> > 
> > 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to