David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700):
> 
> But the curious thing is, when I pull the offending string out of
> the RSS and just stick it in a script, Encode knows how to decode it
> properly, while XML::LibXML (and my Unicode-aware editors) cannot.

Try passing the parser options as a hash reference:

  my $doc = $parser->parse_html_string($str, {encoding => 'utf-8'});

In order to print Unicode text strings (as opposed to octet strings)
correctly to a terminal (UTF-8 or not), add the following line before
the first output:

  binmode STDOUT, ':utf8';

But note that STDOUT is global.

Hope this helps!
-- 
Michael Ludwig

Reply via email to