Hi Enzo,
let me try a different approach.
- the XML on file is a serialized version of a
memory model called the "infoset"
- you retrieve a infoset by parsing the XML file, using a parser
- you get an XML representation of the infoset by
either serializing it (using a serializer) or by
creating it by hand (e.g. with a lot of printf)
- if the infoset contains a text node containing
a single '&', the XML on file will contain a
string "&"; also, if the infoset contains
'(', the XML could contain either '(', "(",
"(" or "(" (if you have an entity
declaration somewhere that maps "lpar" to '(')
So, there is a non-unique mapping between what
you see in an XML file and what is represented in memory after parsing.
Now, you are manipulating a DOM tree by creating
DOMText DOMElement nodes; DOM is an object model
built on top of the infoset, so what you are
writing there is going through this mapping before becoming an XML file.
But you shouldn't care, provided that the XML
that comes out of the serialization is a valid
representation of the infoset containing the data
you want. So, you should store in the DOMText the
*data* you need to store, e.g. the "( ¤ ¥ )" string, e.g. by doing the
dtxt = pDoc->createTextNode( " start ' < > & \x28 \xA4 \xA5 ) end");
we told you earlier.
As an alternative, you can choose to directly
generate the XML representation of your data, by doing
formatter << "<node>start ' < > & ( ¤....</node>";
Hope this clears the matter once for all,
Alberto
At 14.47 19/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
Do you mean that using DOM is not possible to store value like xA5 ?
I try to load a file with extrachars using XercesDOMParser, the I got the
DOM from tha parser and I print it, it have the extra chars
output:
DOCUMENT: <?xml version="1.0" encoding="UTF-16" standalone="no"
?><Messaggio>
<Test1> start ' < > & ( ¤ ¥ ) end </Test1>
</Messaggio>
Premi Invio per continuare!
*****************************************
* input file
*****************************************
<?xml version="1.0" encoding="UTF-16" standalone="no" ?>
<Messaggio>
<Test1> start ' < > & ( ¤ ¥ ) end
</Test1>
</Messaggio>
*****************************************
* reading the file with XercesDOMParser *
*****************************************
XercesDOMParser * domParser;
// -------------------------------------------------------
domParser = new XercesDOMParser;
domParser->setValidationScheme( XercesDOMParser::Val_Auto );
domParser->setDoNamespaces( false );
domParser->setDoSchema( false );
domParser->setValidationSchemaFullChecking( false );
domParser->setCreateEntityReferenceNodes( false );
DOMTreeErrorReporter * errReporter = new DOMTreeErrorReporter();
domParser->setErrorHandler( (ErrorHandler*) errReporter );
string sfile( "/test/test1.xml" );
domParser->parse( sfile.c_str() );
delete errReporter;
int nerr = domParser->getErrorCount();
if( nerr > 0 )
{
MYLOG( ae_util::format_string( "PARSE FAILED file=[%s] num.err=%d ",
sfile.c_str(), nerr ));
return IRET_ERROR;
}
DOMDocument * pDoc = domParser->getDocument();
stmp = ManagCmd::GetStringFromDOMDocument( pDoc );
cout << "DOCUMENT: " + stmp << endl;
delete domParser;
§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§
why this code , behave differently ?
DOMText * dtxt;
DOMImplementation * impl =
DOMImplementationRegistry::getDOMImplementation( X("LS") );
if( impl != NULL )
{
DOMDocument * pDoc = impl->createDocument( 0, X("Messaggio"), 0 );
pDoc->setEncoding( X("UTF-16") );
DOMElement * pRoot = pDoc->getDocumentElement();
DOMElement * pTest = pDoc->createElement( X("TEST2") );
pRoot->appendChild( pTest );
stmp = " start ' < > & ( ¤ ¥ )
end";
dtxt = pDoc->createTextNode( X( stmp.c_str()));
pTest->appendChild( dtxt );
stmp = ManagCmd::GetStringFromDOMDocument( pDoc );
cout << "DOCUMENT: " + stmp << endl;
}
**************
* output:
**************
DOCUMENT: <?xml version="1.0" encoding="UTF-16" standalone="no"
?><Messaggio><TEST2> start &apos; &lt; &gt; &amp;
&#x28; &#xA4; &#xA5; &#x29; end</TEST2></Messaggio>
Premi Invio per continuare!
-----Messaggio originale-----
Da: Alberto Massari [mailto:[EMAIL PROTECTED]
Inviato: lunedì 19 settembre 2005 10.28
A: c-dev@xerces.apache.org
Oggetto: Re: R: R: R: using non standard character with zerces
Hi Enzo,
if you want to place reserved characters in the
final XML, you should not use DOM. When you
create a DOMText node you are asking "this is the
text you must store, be sure that it is stored in
a way that, when later retrieved, it's still this
text". So, if you use reserved characters like
"&", they get expanded into "&" so that, upon
loading, you find "&" in the corresponding
DOMText. If you need to manually compose an XML
you are better off using XMLFormatter and feeding
it with literals like "<nodename>#</nodename>".
Alberto
At 10.03 19/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
>But what I need is really a very simple way which enable me to put inside
>the xml stream a sequence of char , including the & char, without this
>latter be parsed and translated in &.
>Which xerces there are no mean to tell the parser to avoid to translate
some
>or all the characters of an output string ?
>
>-----Messaggio originale-----
>Da: Alberto Massari [mailto:[EMAIL PROTECTED]
>Inviato: venerdì 16 settembre 2005 18.51
>A: c-dev@xerces.apache.org
>Oggetto: Re: R: R: using non standard character with zerces
>
>
>Hi Enzo,
>
>At 18.05 16/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
> >But when can I include special character inside a node.
> >I want to use the format &#xXX . but the '&' where processed and
translate
> >in & so the character ¥ whill be converted to &#xA5 instead
of
> >the desired current character entitity.
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]