The output from your example looks like UTF-8 data (Ã is a
commonly seen UTF-8 escape sequence). XML::Parser converts all
incoming text into UTF-8. You will need to convert it back to
iso-8859-1.
My favorite is Text::Iconv
use Text::Iconv;
$utf8tolatin1 = Text::Iconv->new("UTF-8", "ISO8859-1");
my $buffer_latin1 = $converter->convert($buffer);
On Tue, May 07, 2002 at 10:51:10AM -0400, John Siracusa wrote:
> I ran into this problem during mod_perl development, and I'm posting it to
> this list hoping that other mod_perl developers have dealt with the same
> thing and have good solutions :)
>
> I've found that strings collected while processing XML using XML::Parser do
> not play nice with the HTML::Entities module. Here's the sample program
> illustrating the problem:
>
> #!/usr/bin/perl -w
>
> use strict;
>
> use HTML::Entities;
> use XML::Parser;
>
> my $buffer;
>
> my $p = XML::Parser->new(Handlers => { Char => \&xml_char });
>
> my $xml = '<?xml version="1.0" encoding="iso-8859-1"?><test>' .
> chr(0xE9) . '</test>';
>
> $p->parse($xml);
>
> print encode_entities($buffer), "\n";
>
> sub xml_char
> {
> my($expat, $string) = @_;
>
> $buffer .= $string;
> }
>
> The output unfortunately looks like this:
>
> é
>
> Which makes very little sense, since the correct entity for 0xE9 is:
>
> é
>
> My current work-around is to run the buffer through a (lossy!?) pack/unpack
> cycle:
>
> my $buffer2 = pack("C*", unpack("U*", $buffer));
> print encode_entities($buffer2), "\n";
>
> This works and prints:
>
> é
>
> I hope it is not lossy when using iso-8859-1 encoding, but I'm guessing it
> will maul UTF-8 or UTF-16. This seems like quite an evil hack.
>
> So, what is the Right Thing to do here? Which module, if any, is at fault?
> Is there some combination of Perl Unicode-related "use" statements that will
> help me here? Has anyone else run into this problem?
>
> -John
--
Paul Lindner [EMAIL PROTECTED] ||||| | | | | | | | | |
mod_perl Developer's Cookbook http://www.modperlcookbook.org/
Human Rights Declaration http://www.unhchr.ch/udhr/