On 9/22/07, Bill Moseley <[EMAIL PROTECTED]> wrote: > > It's been a long day. What other mime types are you thinking of other > than text/*? >
The most complete implementation imaginable would start with at least these: text/html (html-specific rules) text/xml (xml-specific rules) text/* (general-purpose text rules) application/*+xml (xml-specific rules) You'd probably also want this to be extensible, so that I can add my own media types at run-time to guarantee my non-obvious textual media type is handled properly. On the other hand, I'm less convinced now that dipping into the HTML or XML content to figure out the proper encoding is necessarily the proper thing to do here. My complaint about LWP::Simple was that the HTTP Content-Type (charset) information is lost by the time it gets to the caller. If the data isn't in text at that point, it will never reliably get there. But for HTML and XML, if the character encoding is actually specified in the contentrather than in the HTTP headers, then it isn't as important to deal with it up front. I could see a case then for dealing with text/* only and returning octets for everything else, since text/* is the only media type that has character encoding details in the HTTP headers. That being said, applications based on LWP::Simple are likely to work better with HTML and XML "assistance" for the reason I gave earlier: users of LWP::Simple probably aren't going to take the time to do the proper parsing and decoding. Yes, it's still "their fault" for not coding a robust application, but helping them do that is I think still a valid goal, if we can do it safely. David