Re: Doing character encoding/decoding within libwww?

David Nesting Sun, 23 Sep 2007 08:48:50 -0700

On 9/22/07, Bjoern Hoehrmann <[EMAIL PROTECTED]> wrote:
>
> Generally speaking, this is rather difficult as some content may not be
> textual at all, and textual formats vary in how applications are to de-
> tect the encoding (e.g., XML has different rules than HTML, text/plain
> has no rules beyond looking at the charset parameter, and so on). If you
> want a general-purpose solution, a good start would be a module taking a
> HTTP::Response object and detecting the encoding, possibly decoding it
> on request.



Fortunately, we know the Content-Type at this point, so we can decide if
it's appropriate to decode it as text, and if so, how to go about doing it.

HTML::Encoding seems like it approaches the problem reasonably well, but
ideally, I'd like to be able some day to use LWP::Simple's get() and get
back a logical text string for text/* or application/*+xml.  Similarly,
getprint() should do the Right Thing with respect to my locale.  Users of
LWP::Simple can't invoke another layer of processing, even if they wanted
to.  So, today, it's either "get back octets that may or may not be useful
as text" or "use the full blown LWP::UserAgent and add another layer
(perhaps too-specifically-named HTML::Encoding) to make sure you get text
right."  It just seems like we can simplify that.

Thanks for the feedback.

David

Re: Doing character encoding/decoding within libwww?

Reply via email to