* Karl DeBisschop wrote:
>I just plugged in HTML::Parser 3.44 on my FC2 servers in order to handle 
>utf-8 encoded content. (Boy was I glad to see that was available)
>
>But when running a robot, LWP::Protocol emits a warning as it works 
>because the content stream is not decoded into perl's native character set.

See http://www.nntp.perl.org/group/perl.libwww/6017 and the relevant
thread for a recent discussion on this. Your patch has a number of
problems, parsing the encoding out of the charset parameter is a bit
more difficult than your regular expression (e.g., the encoding name
might be a quoted-string as in charset="utf-8"), the routine would now
croak in common cases such as an unsupported character encoding, and
it fails to deal with encodings such as ISO-2022-JP that maintain a
state (see Encode::PerlIO) or where characters might be longer than
one octet such as UTF-8 (consider one chunk has "Bj\xC3" and the other
chunk has "\xB6rn", you need to know the \xC3 when decoding the \xB6).
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Reply via email to