[EMAIL PROTECTED] writes:

> LWP seems to have issues with fetching pages that are utf-8 encoded.
> 
> Using a simple script like
> 
>     use LWP::UserAgent;
>     use Encode;
> 
>     my $ua = LWP::UserAgent->new();
>     my $resp = $ua->get("http://bild.de";);
> 
>     if(Encode::is_utf8($resp->content)) {
>         print "utf8\n";
>     } else {
>         print "no utf8\n";
>     }
> 
> shows
> 
>     "no utf8"
> 
> (meaning that although the page is utf-8 encoded, the resulting Perl string 
> isn't)
> and it prints the warning
> 
>     Parsing of undecoded UTF-8 will give garbage when decoding entities
>     at .../LWP/Protocol.pm line 114.
> 
> which seems to be related to a message I posted last year:
> 
>     http://www.nntp.perl.org/group/perl.libwww/2006/08/msg6801.html
> 
> although there were no responses at the time.

This is indeed an outstanding bug that I don't have any good fix for
yet, but you can work around the warning by setting the 'parse_head'
attribute to FALSE:

   my $ua = LWP::UserAgent->new(parse_head => 0);

Extracting the content using the:

   $resp->decoded_content

method will decode the UTF-8 if there is a proper header to be found
in the HTTP reponse.

--Gisle

Reply via email to