Re: R: R: R: R: using non standard character with zerces

Alberto Massari Mon, 19 Sep 2005 06:06:54 -0700

Hi Enzo,
let me try a different approach.

- the XML on file is a serialized version of amemory model called the "infoset"

- you retrieve a infoset by parsing the XML file, using a parser

- you get an XML representation of the infoset byeither serializing it (using a serializer) or bycreating it by hand (e.g. with a lot of printf)- if the infoset contains a text node containinga single '&', the XML on file will contain astring "&"; also, if the infoset contains'(', the XML could contain either '(', "(","&#40" or "(" (if you have an entitydeclaration somewhere that maps "lpar" to '(')

So, there is a non-unique mapping between whatyou see in an XML file and what is represented in memory after parsing.

Now, you are manipulating a DOM tree by creatingDOMText DOMElement nodes; DOM is an object modelbuilt on top of the infoset, so what you arewriting there is going through this mapping before becoming an XML file.But you shouldn't care, provided that the XMLthat comes out of the serialization is a validrepresentation of the infoset containing the datayou want. So, you should store in the DOMText the*data* you need to store, e.g. the "( ¤ ¥ )" string, e.g. by doing the


       dtxt = pDoc->createTextNode( " start ' < >  & \x28 \xA4 \xA5 ) end");

we told you earlier.

As an alternative, you can choose to directlygenerate the XML representation of your data, by doing


  formatter << "<node>start &apos; &lt; &gt; &amp; &#x28; &#xA4....</node>";

Hope this clears the matter once for all,
Alberto

At 14.47 19/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:

Do you mean that using DOM is not possible to store value like xA5 ?
I try to load a file with extrachars using XercesDOMParser, the I got the
DOM from tha parser and I print it, it have the extra chars


output:
DOCUMENT: <?xml version="1.0" encoding="UTF-16" standalone="no"
?><Messaggio>
    <Test1> start ' &lt; &gt;  &amp; ( ¤ ¥ )  end  </Test1>
</Messaggio>

Premi Invio per continuare!




*****************************************
* input file
*****************************************
<?xml version="1.0"  encoding="UTF-16" standalone="no" ?>
<Messaggio>
    <Test1> start &apos; &lt; &gt;  &amp; &#x28; &#xA4; &#xA5; &#x29;  end
</Test1>
</Messaggio>



*****************************************
* reading the file with XercesDOMParser *
*****************************************

    XercesDOMParser * domParser;

    // -------------------------------------------------------
    domParser = new XercesDOMParser;
    domParser->setValidationScheme( XercesDOMParser::Val_Auto );
    domParser->setDoNamespaces( false );
    domParser->setDoSchema( false );
    domParser->setValidationSchemaFullChecking( false );
    domParser->setCreateEntityReferenceNodes( false );

    DOMTreeErrorReporter * errReporter = new DOMTreeErrorReporter();
    domParser->setErrorHandler( (ErrorHandler*)  errReporter );

    string sfile( "/test/test1.xml" );
    domParser->parse( sfile.c_str() );
    delete errReporter;
    int nerr = domParser->getErrorCount();
    if( nerr > 0 )
    {
       MYLOG( ae_util::format_string( "PARSE FAILED file=[%s] num.err=%d ",
               sfile.c_str(), nerr ));
       return IRET_ERROR;
    }

    DOMDocument * pDoc = domParser->getDocument();
    stmp = ManagCmd::GetStringFromDOMDocument( pDoc );
    cout << "DOCUMENT: " + stmp << endl;
    delete domParser;

§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§

why this code , behave differently ?

    DOMText * dtxt;
    DOMImplementation * impl =
DOMImplementationRegistry::getDOMImplementation( X("LS") );

    if( impl != NULL )
    {
       DOMDocument * pDoc = impl->createDocument( 0, X("Messaggio"), 0 );
       pDoc->setEncoding( X("UTF-16") );
       DOMElement * pRoot = pDoc->getDocumentElement();

       DOMElement * pTest = pDoc->createElement( X("TEST2") );
       pRoot->appendChild( pTest );

       stmp = " start &apos; &lt; &gt;  &amp; &#x28; &#xA4; &#xA5; &#x29;
end";
       dtxt                  = pDoc->createTextNode( X( stmp.c_str()));
       pTest->appendChild( dtxt );
       stmp = ManagCmd::GetStringFromDOMDocument( pDoc );
       cout << "DOCUMENT: " + stmp << endl;
    }


**************
* output:
**************
DOCUMENT: <?xml version="1.0" encoding="UTF-16" standalone="no"
?><Messaggio><TEST2> start &amp;apos; &amp;lt; &amp;gt;  &amp;amp;
&amp;#x28; &amp;#xA4; &amp;#xA5; &amp;#x29;  end</TEST2></Messaggio>

Premi Invio per continuare!



-----Messaggio originale-----
Da: Alberto Massari [mailto:[EMAIL PROTECTED]
Inviato: lunedì 19 settembre 2005 10.28
A: c-dev@xerces.apache.org
Oggetto: Re: R: R: R: using non standard character with zerces


Hi Enzo,
if you want to place reserved characters in the
final XML, you should not use DOM. When you
create a DOMText node you are asking "this is the
text you must store, be sure that it is stored in
a way that, when later retrieved, it's still this
text". So, if you use reserved characters like
"&", they get expanded into "&amp;" so that, upon
loading, you find "&" in the corresponding
DOMText. If you need to manually compose an XML
you are better off using XMLFormatter and feeding
it with literals like "<nodename>&#x23;</nodename>".

Alberto

At 10.03 19/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
>But what I need is really a very simple way which enable me to put inside
>the xml stream a  sequence of char , including the & char, without this
>latter be parsed and translated in &amp;.
>Which xerces there are no mean to tell the parser to avoid to translate
some
>or all the characters of an output string ?
>
>-----Messaggio originale-----
>Da: Alberto Massari [mailto:[EMAIL PROTECTED]
>Inviato: venerdì 16 settembre 2005 18.51
>A: c-dev@xerces.apache.org
>Oggetto: Re: R: R: using non standard character with zerces
>
>
>Hi Enzo,
>
>At 18.05 16/09/2005 +0200, AESYS S.p.A. [Enzo Arlati] wrote:
> >But when can I include special character inside a node.
> >I want to use the format &#xXX . but the '&' where processed and
translate
> >in &amp; so the character &#xA5; whill be converted to &amp;#xA5 instead
of
> >the desired current character entitity.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: R: R: R: R: using non standard character with zerces

Reply via email to