Thanks for your advice... the output does look different, this time, but it still doesn't look like utf8... (I get the same error with recode).

If somebody could suggest a way to convert to another encoding, or a better way to identify the encoding of eac page, that would also be fine (once I have control over the encodings, I think I can find some way to convert back to utf8 (eg, via recode).

Thanks again,

Marco

On Saturday, May 8, 2004, at 05:16 Europe/Rome, Edward Batutis wrote:

Marco:

I think you are converting twice:

# output will be utf8
binmode(STDOUT, ":utf8");
...
                from_to($html_text,$charset,"utf8");
...

Here, it will convert html_text to utf-8 again because of binmode with utf-8:

print "CURRENT URL $url\n$html_text\n";

I think you can just remove the binmode line and it will work.


Why do encodings always cause so much pain?

I hope this helps today's pain, at least :-).


Regards,

=Ed





Reply via email to