On 9/23/07, Bjoern Hoehrmann <[EMAIL PROTECTED]> wrote:
>
>
> Well that is necessarily so to keep the interface simple. Going from
> LWP::Simple::get to LWP::UserAgent->new->get(...) is easy enough to not
> warrant adding functionality to LWP::Simple.


My concern, though, is that with this approach, LWP::Simple isn't just
lacking features: it's harmful.  Users of LWP::Simple today cannot guarantee
that the octets they get are usable as text.  So long as applications use
it, these applications will never be properly internationalizable and we
will continue seeing new applications written that don't properly handle
character encodings.

Actually that is not the case, there are plenty of, say, application/*
> formats, like the XML types, that carry encoding information in the
> header, without replicating it in the content (likewise, information in
> the content may not be replicated in the header, and the two may contra-
> dict each other).


I didn't notice that application/xml and +xml media types also made the HTTP
charset authoritative.  Basically, my thought is that if it follows these
rules (by placing it in the HTTP headers), it seems appropriate to decode it
as text.  Otherwise, the charset information will require some closer
inspection, but but could easily be done by the caller even if they use
LWP::Simple.


> Well, automagic decoding of content cannot be added to LWP::Simple with-
> out some opt-in switch as that would break a lot of programs, and if you
> require some opt-in, you might as well require switching the module.


That's certainly a good argument.  You could also just supplement its
methods with variants that attempt to return text instead of octets, and
deprecate or at least discourage the use of the other methods when you're
expecting text.  (It might be appropriate to print out a warning when an
octet-based method is used to fetch a textual media type.)

If LWP::Simple can't be easily changed to manage character encodings
cluefully, reasonably completely, and transparently to the caller, the
responsible thing to do would be to add some verbiage to its documentation
making this clear and discouraging its use altogether for retrieving text.

David

Reply via email to