Bill Moseley schrieb: > So, in general, I would bring character data into Perl like: > > my $characters = $response->decoded_content; > > Then you work with $characters as needed. > > And then when you want to output you convert back to whatever encoding > you need: > > $utf8_octets = encode_utf8( $characters ); > > send_to_client( $utf8_octets ); > > For your case you might try $tree->parse( $response->decoded_content > ); Or, if you have raw utf-8 octets that you need to parse I think > you can call $tree->utf8_mode( 1 ) to tell the parser to decode. But, > I'd prefer the first. > That seems to be a good idea. There are only some modifications I have to make, because there is not always the same encoding for incoming documents. It can be latin1 or utf-8 or others. Those who create the web pages are not always that precise. That's why HTML::Parser is such a good choice in this cases, because it is tolerant.
I thought that not touching the encoding would be the best idea, but decoding characters with code points higher than 255 seems to be better. But it might also a good idea to use $response->decoded_content and later encode the content again. At least if $response provides always for an ->content_charset. Thank you. Best regards, Oliver Block