* Karl DeBisschop wrote: >I just plugged in HTML::Parser 3.44 on my FC2 servers in order to handle >utf-8 encoded content. (Boy was I glad to see that was available) > >But when running a robot, LWP::Protocol emits a warning as it works >because the content stream is not decoded into perl's native character set.
See http://www.nntp.perl.org/group/perl.libwww/6017 and the relevant thread for a recent discussion on this. Your patch has a number of problems, parsing the encoding out of the charset parameter is a bit more difficult than your regular expression (e.g., the encoding name might be a quoted-string as in charset="utf-8"), the routine would now croak in common cases such as an unsupported character encoding, and it fails to deal with encodings such as ISO-2022-JP that maintain a state (see Encode::PerlIO) or where characters might be longer than one octet such as UTF-8 (consider one chunk has "Bj\xC3" and the other chunk has "\xB6rn", you need to know the \xC3 when decoding the \xB6). -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/