On 9/23/07, Bjoern Hoehrmann <[EMAIL PROTECTED]> wrote: > > > Well that is necessarily so to keep the interface simple. Going from > LWP::Simple::get to LWP::UserAgent->new->get(...) is easy enough to not > warrant adding functionality to LWP::Simple.
My concern, though, is that with this approach, LWP::Simple isn't just lacking features: it's harmful. Users of LWP::Simple today cannot guarantee that the octets they get are usable as text. So long as applications use it, these applications will never be properly internationalizable and we will continue seeing new applications written that don't properly handle character encodings. Actually that is not the case, there are plenty of, say, application/* > formats, like the XML types, that carry encoding information in the > header, without replicating it in the content (likewise, information in > the content may not be replicated in the header, and the two may contra- > dict each other). I didn't notice that application/xml and +xml media types also made the HTTP charset authoritative. Basically, my thought is that if it follows these rules (by placing it in the HTTP headers), it seems appropriate to decode it as text. Otherwise, the charset information will require some closer inspection, but but could easily be done by the caller even if they use LWP::Simple. > Well, automagic decoding of content cannot be added to LWP::Simple with- > out some opt-in switch as that would break a lot of programs, and if you > require some opt-in, you might as well require switching the module. That's certainly a good argument. You could also just supplement its methods with variants that attempt to return text instead of octets, and deprecate or at least discourage the use of the other methods when you're expecting text. (It might be appropriate to print out a warning when an octet-based method is used to fetch a textual media type.) If LWP::Simple can't be easily changed to manage character encodings cluefully, reasonably completely, and transparently to the caller, the responsible thing to do would be to add some verbiage to its documentation making this clear and discouraging its use altogether for retrieving text. David