Hi

I'm trying to feed text into an existing XML tree - the problem I'm
encountering is that the text may contain entity references (including
the 'forbidden' '&'), in which case the & is escaped by '&'. I'm
using the module XML::DOM for this.



Here's an example of an empty tree (the file 0061a.xml):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI.2 SYSTEM "tei_bawe.dtd">
<TEI.2>
<teiHeader>
<titleStmt>
<title/>
</titleStmt>
</teiHeader>
<text>
</text>
</TEI.2>


-------

Here's my script:

#!/usr/bin/perl
use strict;
use XML::DOM;
use warnings;


my $titleText = "Die Br&uuml;cke.";
my $infile = "0061a.xml";

my $dom_parser = new XML::DOM::Parser;
my $TREE = $dom_parser->parsefile($infile) or die "\ncannot parse file
input [$infile]\n";
$TREE->normalize();

my $root = $TREE->getDocumentElement();
my $title = ${$root->getElementsByTagName("title", 1)}[0];

$title->addText($titleText);
print "$titleText\n"; # for testing: Die Br&uuml;cke.
print $title->toString(); # for testing: <title>Die
Br&amp;uuml;cke.</title>

open OUT, ">0061a.out.xml" or die "cannot write to OUT: $!";
print OUT $TREE->toString();


-------

The output file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI.2 SYSTEM "tei_bawe.dtd">
<TEI.2>
<teiHeader>
<titleStmt>
<title>Die Br&amp;uuml;cke.</title>
</titleStmt>
</teiHeader>
<text>
</text>
</TEI.2>


-------

- whereas I'd like to get
<title>Die Br&uuml;cke.</title>


Thanks for any suggestions!

Alois


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to