The output from your example looks like UTF-8 data (Ã is a commonly seen UTF-8 escape sequence). XML::Parser converts all incoming text into UTF-8. You will need to convert it back to iso-8859-1.
My favorite is Text::Iconv use Text::Iconv; $utf8tolatin1 = Text::Iconv->new("UTF-8", "ISO8859-1"); my $buffer_latin1 = $converter->convert($buffer); On Tue, May 07, 2002 at 10:51:10AM -0400, John Siracusa wrote: > I ran into this problem during mod_perl development, and I'm posting it to > this list hoping that other mod_perl developers have dealt with the same > thing and have good solutions :) > > I've found that strings collected while processing XML using XML::Parser do > not play nice with the HTML::Entities module. Here's the sample program > illustrating the problem: > > #!/usr/bin/perl -w > > use strict; > > use HTML::Entities; > use XML::Parser; > > my $buffer; > > my $p = XML::Parser->new(Handlers => { Char => \&xml_char }); > > my $xml = '<?xml version="1.0" encoding="iso-8859-1"?><test>' . > chr(0xE9) . '</test>'; > > $p->parse($xml); > > print encode_entities($buffer), "\n"; > > sub xml_char > { > my($expat, $string) = @_; > > $buffer .= $string; > } > > The output unfortunately looks like this: > > é > > Which makes very little sense, since the correct entity for 0xE9 is: > > é > > My current work-around is to run the buffer through a (lossy!?) pack/unpack > cycle: > > my $buffer2 = pack("C*", unpack("U*", $buffer)); > print encode_entities($buffer2), "\n"; > > This works and prints: > > é > > I hope it is not lossy when using iso-8859-1 encoding, but I'm guessing it > will maul UTF-8 or UTF-16. This seems like quite an evil hack. > > So, what is the Right Thing to do here? Which module, if any, is at fault? > Is there some combination of Perl Unicode-related "use" statements that will > help me here? Has anyone else run into this problem? > > -John -- Paul Lindner [EMAIL PROTECTED] ||||| | | | | | | | | | mod_perl Developer's Cookbook http://www.modperlcookbook.org/ Human Rights Declaration http://www.unhchr.ch/udhr/