Hello,
I have a problem, apparently on an encoding issue, but can't figure out
where it comes from. Could someone please help?
I'm reading from an XML file that contains the line
[1] ...Bergson referred as "durée"; the way...
Then I parse the file with XML::DOM::Parser and print it out again.
The line now becomes:
[2] ...Bergson referred as "dur㩥; the way...
Where can this possibly come from? Does "standard" reading and printing
not produce UTF-8? And does XML::DOM::Parser not read input as UTF-8?
So, when I print it out, should it not be UTF-8 again?
The file containing the first line was written like this:
#!/usr/bin/perl
use strict;
use warnings;
use encoding 'utf-8';
my $infile = "file1.xml";
open IN, "$infile" or die "\ncannot read specified infile\n";
my $text = join "", <IN>;
close IN;
# some processing...
my $outfile = "file2.xml";
open OUT, ">:encoding(utf-8)", $outfile or die "cannot create out file";
print OUT $text;
close OUT;
# alternatively I tried:
# open IN, "<:encoding(utf-8)", "$infile"; # and
# open OUT, ">$outfile" or die "cannot create out file";
# respectively. It makes no difference.
The second script reads/writes like this:
#!/usr/bin/perl
use strict;
use XML::DOM;
use warnings;
my $infile = "file2.xml";
my $dom_parser = new XML::DOM::Parser();
my $TREE = $dom_parser->parsefile($infile);
open OUT, ">file3.xml" or die "could not open log file";
print OUT $TREE->toString();
close OUT;
Thanks for any comments!
Alois Heuboeck
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>