I'll see about changing getResponseBodyAsString() to use the charset from
the content-type (if it exists).  I'm up to my ears with day job work right
now, so it'll probably be a while before I can get to it.

People still need to understand (and I'll improve the JavaDoc) that
getResponseBodyAsString() is never really going to be all that useful in the
real world.  From HttpClient's perspective the response body is simply a
sequence of bytes, nothing more.  It is up to a higher application layer to
actually *interpret* those bytes based on the mime type specified in the
content-type header.

Marc Saegesser 

> -----Original Message-----
> From: Rapheal Kaplan [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, March 20, 2002 1:53 PM
> To: Jakarta Commons Developers List
> Subject: Re: [HttpClient]Encoding
> 
> 
>   Makes sense to me.  Because the encoding is handled in the 
> body itself, it 
> doesn't necessarily help that much to set the encoding in the 
> getResponseBodyAsString method.  Also, this kind of means 
> that you can't rely 
> on the getResponseBodyAsString method for all purposes.  
> There needs to be 
> some other layer of a client application that manages encoding.
> 
>   I still see the use of get...AsString, of course.  It could 
> be an inbetween 
> step that is sent to a parser to determine actual encoding, 
> but then you 
> would need to return to the original byte stream anyway to 
> re-string the 
> body.  Maybe the documentation should reflect this information.
> 
>   Also, if people start using charset info in the future, it 
> would probably 
> be nice to provide support.  It might be that doing body to 
> string conversion 
> should be somewhere else in the API.  Any ideas?
> 
>   My first guess would be to have a utility class that can do 
> the correct 
> encoding, from both the header and maybe even parsing the 
> content.  However, 
> I don't think I am framiliar enough with the API to say decisivly.
> 
>   I do know that such features might be very useful for some work 
> that I need to do in the near future.  I am working one 
> software that needs 
> to interact with several languages with non-latin character sets.
> 
>   - Rapheal Kaplan
> 
> 
> 
> On Wednesday 20 March 2002 14:27, you wrote:
> > I've had to deal with this problem myself.  Right now the 
> only solution is
> > to use getResponseBody() and convert bytes into a string using the
> > appropriate encoding.  I like the idea of having 
> getResponseBodyAsString()
> > use the encoding specified in the Content-Type header, but 
> the problem is
> > that it still won't be very useful.
> >
> > The vast majority of web servers out there don't include a 
> "; charset="
> > attribute in the content-type header or provide a 
> reasonable mechanism for
> > content authors to cause the server to set the attribute 
> correctly on a
> > per-file basis.  Most pages with non-ISO-LATIN-1 charsets use <META
> > HTTP-EQUIV> tag in the HTML header to specify the page 
> encoding.  That
> > means you still have to read at least part of the response body (as
> > ISO-LATIN-1) in order to determine the correct encoding.
> >
> > I don't have a problem with changing 
> getResponseBodyAsString() to check the
> > content-type header, I just doubt that doing that will make 
> it much more
> > useful in the real world.
> >
> > What do others think?
> >
> > Marc Saegesser
> >
> 
> --
> To unsubscribe, e-mail:   
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to