Hi,
Quite simply, that is not valid xml.
The ampersand is a 'special' character and must be referred to via it's
entity-reference ( &anp; ) for the character itself.
You should find a lot of stuff on this via various search engines or
basic xml tutorials. You can get the full XML specification from
www.w3.org, but the following two articles should suffice and provide
further pointers for related reading :
http://www.xml.com/pub/a/2003/02/26/qa.html
http://www.xml.com/pub/a/2001/01/31/qanda.html
Regards
Dara
Xiaolei Li wrote:
Hi,
I'm trying to read in all the #text nodes in a set of XML documents,
but I'm running into problems when the document content includes
ampersands (&) in the text.
So given a document path, I use XercesDOMParser to get the root
DOMNode*. Using that node, I traverse the entire tree looking for
#text nodes. Whenever I see a #text node, I getNodeValue() and do a
XMLString::transcode() on it to get the char*.
This works fine until I run into a document that has & in its
content. For example,
=========================
...
<TEXT>
Maryland Federal Bancorp Inc., a Hyattsville-based thrift, announced
yesterday
that it will be acquired by BB&T Corp. of Winston-Salem, N.C., for $
265.3
million in stock.
...
=========================
For some reason, the char* I get back from XMLString::transcode()
only gives me the text up to "BB" (in "BB&T"). If I manually delete
the & from the file, it'll parse just fine. So basically, the "&" is
ending the text prematurely.
I'm a total XML noob so I have no clue what to do here. I'm probably
just missing something very basic. Any guidance would be greatly
appreciated.
Thank you.
-Xiaolei