Hello everyone,

the following code is used to load a web page from a certain web server
and parse it into an html tree. At the end a variable is assigned the
string representation of that tree.

        use LWP::UserAgent;
        use HTML::TreeBuilder;

        my $ua = LWP::UserAgent->new;
        my $response = $ua->get($form->{'url'});

        my $tree = HTML::TreeBuilder->new();
        $tree->parse($response->content);

# ...
# encoding of content of $tree is ISO-8859-1 at this point
        $template = $tree->as_HTML('<>&');

# encoding of content of $template is UTF-8

Now the following problem arises. The encoding of the content of
$template (UTF-8) is not the same than the content of $tree
(ISO-8859-1). So it is obvious, that as_HTML converts the encoding to UTF-8.

Is this behavior of as_HTML known? Will this be changed?

Best Regards,

Oliver Block

Reply via email to