David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700): > > But the curious thing is, when I pull the offending string out of > the RSS and just stick it in a script, Encode knows how to decode it > properly, while XML::LibXML (and my Unicode-aware editors) cannot.
Try passing the parser options as a hash reference: my $doc = $parser->parse_html_string($str, {encoding => 'utf-8'}); In order to print Unicode text strings (as opposed to octet strings) correctly to a terminal (UTF-8 or not), add the following line before the first output: binmode STDOUT, ':utf8'; But note that STDOUT is global. Hope this helps! -- Michael Ludwig