RE: Parsing Nested XML tags in Xercers-C

Jesse Pelton Thu, 25 Mar 2010 10:47:47 -0700

SAX would also be more suitable if the documents to be processed are large.  
While the DOM hierarchy represents the entire document in memory, SAX makes 
calls to your handlers as a document is parsed.  You get to choose whether and 
how to represent the data in memory.



-----Original Message-----
From: John Lilley [mailto:[email protected]]
Sent: Thu 3/25/2010 12:30 PM
To: [email protected]
Subject: RE: Parsing Nested XML tags in Xercers-C
 
Gaurev,

I don't think that there is a "find child element by tag" function in Xerces; 
we had to write those ourselves (someone on the list feel free to correct me on 
this point).  So just loop over the child elements, get the tag for each one 
and check it against what you are looking for.

There are more efficient approaches involving the SAX parser, but unless 
performance is important this will work well.

john


-----Original Message-----
From: Gaurav Kumar [mailto:[email protected]] 
Sent: Thursday, March 25, 2010 6:40 AM
To: [email protected]
Subject: RE: Parsing Nested XML tags in Xercers-C

Hi John,

I've been reading the API document. As you have notice that i've a node
<LOCATE_protein> and it contains for  child nodes containing different
information. I can extract the node attributes from the <LOCATE_protein>
node. Can you suggest what function need to use for checking the presence
of specific node I.e. whether child nodes such as <externalannot> ,
<literature>, <direct_interaction> or <metabolic_interaction> is
present/absent. Once I know the node types I can extract the information
lying inside the nested node.

Thanks in advance.

cheers
Gaurav
> You are on the right track, just create another nested child loop and look
> for the items in there.   You didn't say what you want to do when you find
> the children, but this will at least allow you to find specific elements
> using your technique.
>
> john
>
> -----Original Message-----
> From: Gaurav Kumar [mailto:[email protected]]
> Sent: Monday, March 22, 2010 8:18 PM
> To: [email protected]
> Subject: Parsing Nested XML tags in Xercers-C
>
>
>  Hi,
>
>  I'm new to Xerces-C and not sure of many concepts within this API. I
>  though to learn this useful API by following tutorials and problems
>  discussed in the mailing list.
>
>  I'm able to extract attributes of the tag <LOCATE_protein>. This tag
>  contains nested children. I need to traverse through the XML tree to
> fetch
>  the required information in the nested tags( from child or grandchild
>  nodes). Can any one suggest any simple function to do that in Xerces-C.
>  Below
>   is the sample XML file and modified code
>  (http://www.yolinux.com/TUTORIALS/XML-Xerces-C.html).
>
>  Thanks in advance
>
>  Cheers
>  Gaurav
>  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>  <LOCATE_interaction
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
>       <LOCATE_protein uid="6000002" uniprot="P27824" refseq="">
>            <externalannot>
>                 <source db="HPRD" db_id="00252"
>  goid="GO:0005764">Lysosomes</source>
>                 <source db="HPRD" db_id="00252" goid="GO:0005635">Nuclear
>  Envelope</source>
>                 <source db="HPRD" db_id="00252" goid="GO:0005794">Golgi
>  Apparatus</source>
>                 <source db="HPRD" db_id="00252"
>  goid="GO:0005783">Endoplasmic Reticulum</source>
>                <source db="HPRD" db_id="00252" goid="GO:0005886">Plasma
>  Membrane</source>
>                 <source db="UniProt/SPTrEMBL" db_id="P27824"
>  goid="GO:0005783">endoplasmic reticulum</source>
>                 <source db="UniProt/SPTrEMBL" db_id="P27824"
>  goid="GO:0042470">melanosome</source>
>            </externalannot>
>            <literature></literature>
>            <direct_interaction>
>                 <entry source="HPRD" source_id="00252" uniprot="P27824"
>  refseq="NP_001737.1">
>                      <name>Calnexin</name>
>                      <interactor type="direct" pubmed_id="8136357">
>                           <molecule source_id="00127" gene_symbol="IFNGR1"
>  uniprot="P15260" refseq="">Interferon gamma
>  receptor 1</molecule>
>                      </interactor>
>       </direct_interaction>
>       <metabolic_interaction>
>                 <entry source_id="hsa:55832">
>                      <gene_name>CAND1</gene_name>
>                      <defination>cullin-associated and
>  neddylation-dissociated 1</defination>
>                      <orthology></orthology>
>                      <class></class>
>                      <enzyme></enzyme>
>                 </entry>
>                <entry source_id="ENSG00000111530-MONOMER"></entry>
>            </metabolic_interaction>
>  </LOCATE_protein>
>       ....
>       ....
>       .....
>  </LOCATE_interaction>
>
>
>
>   m_ConfigFileParser->parse( configFile.c_str() );
>
>        DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();
>
>        DOMElement* elementRoot = xmlDoc->getDocumentElement();
>        if( !elementRoot ) throw(std::runtime_error( "empty XML document"
>  ));
>      DOMNodeList*      children = elementRoot->getChildNodes();
>
>        cout << "Total Locates Proteins : " << children->getLength() <<
>  endl;
>
>       for( XMLSize_t xx = 0; xx < children->getLength(); ++xx )
>        {
>           DOMNode* currentNode = children->item(xx);
>           if( currentNode->getNodeType() == DOMNode::ELEMENT_NODE )
>           {
>              // Found node which is an Element. Re-cast node as element
>              DOMElement* currentElement
>                          = dynamic_cast< xercesc::DOMElement* >(
>  currentNode );
>           //cout << currentElement << endl;
>              if(
>  XMLString::equals(currentElement->getTagName(),TAG_locateProtein))
>              {
>                 // Already tested node as type element and of name
>  "ApplicationSettings".
>                 // Read attributes of element "ApplicationSettings".
>             const XMLCh* xmlch_locateID
>                      = currentElement->getAttribute(ATTR_locateID);
>                m_locateID = XMLString::transcode(xmlch_locateID);
>
>             const XMLCh* xmlch_locateUniprotID
>                   = currentElement->getAttribute(ATTR_locateUniprotID);
>             m_locateUniprotID = XMLString::transcode(xmlch_locateUniprotID);
>
>             const XMLCh* xmlch_locateRefseqID
>                   = currentElement->getAttribute(ATTR_locateRefseqID);
>             m_locateRefseqID = XMLString::transcode(xmlch_locateRefseqID);
>
>             cout << "Locate ID:"
>                  << m_locateID
>                  << "|UniprotID:"
>                  << m_locateUniprotID
>                  << "|RefseqID:"
>                  << m_locateRefseqID
>                  << endl;
>
>             DOMNode* currentChild=currentNode->getFirstChild();
>             cout << currentChild->getTextContent() << endl;
>             cout <<
>  XMLString::transcode(currentNode->getFirstChild()->getNodeName())
>                  << endl;
>       }
>      }
>  }
>
>
>
> --
> Mr. Gaurav Kumar
> PhD Student (Bioinformatics/Computational Biology)
>
>


-- 
Mr. Gaurav Kumar
PhD Student (Bioinformatics/Computational Biology)

RE: Parsing Nested XML tags in Xercers-C

Reply via email to