RE: Parsing Nested XML tags in Xercers-C

Gaurav Kumar Thu, 25 Mar 2010 05:40:36 -0700

Hi John,

I've been reading the API document. As you have notice that i've a node
<LOCATE_protein> and it contains for  child nodes containing different
information. I can extract the node attributes from the <LOCATE_protein>
node. Can you suggest what function need to use for checking the presence
of specific node I.e. whether child nodes such as <externalannot> ,
<literature>, <direct_interaction> or <metabolic_interaction> is
present/absent. Once I know the node types I can extract the information
lying inside the nested node.


Thanks in advance.

cheers
Gaurav
> You are on the right track, just create another nested child loop and look
> for the items in there.   You didn't say what you want to do when you find
> the children, but this will at least allow you to find specific elements
> using your technique.
>
> john
>
> -----Original Message-----
> From: Gaurav Kumar [mailto:[email protected]]
> Sent: Monday, March 22, 2010 8:18 PM
> To: [email protected]
> Subject: Parsing Nested XML tags in Xercers-C
>
>
>  Hi,
>
>  I'm new to Xerces-C and not sure of many concepts within this API. I
>  though to learn this useful API by following tutorials and problems
>  discussed in the mailing list.
>
>  I'm able to extract attributes of the tag <LOCATE_protein>. This tag
>  contains nested children. I need to traverse through the XML tree to
> fetch
>  the required information in the nested tags( from child or grandchild
>  nodes). Can any one suggest any simple function to do that in Xerces-C.
>  Below
>   is the sample XML file and modified code
>  (http://www.yolinux.com/TUTORIALS/XML-Xerces-C.html).
>
>  Thanks in advance
>
>  Cheers
>  Gaurav
>  <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>  <LOCATE_interaction
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
>       <LOCATE_protein uid="6000002" uniprot="P27824" refseq="">
>            <externalannot>
>                 <source db="HPRD" db_id="00252"
>  goid="GO:0005764">Lysosomes</source>
>                 <source db="HPRD" db_id="00252" goid="GO:0005635">Nuclear
>  Envelope</source>
>                 <source db="HPRD" db_id="00252" goid="GO:0005794">Golgi
>  Apparatus</source>
>                 <source db="HPRD" db_id="00252"
>  goid="GO:0005783">Endoplasmic Reticulum</source>
>                <source db="HPRD" db_id="00252" goid="GO:0005886">Plasma
>  Membrane</source>
>                 <source db="UniProt/SPTrEMBL" db_id="P27824"
>  goid="GO:0005783">endoplasmic reticulum</source>
>                 <source db="UniProt/SPTrEMBL" db_id="P27824"
>  goid="GO:0042470">melanosome</source>
>            </externalannot>
>            <literature></literature>
>            <direct_interaction>
>                 <entry source="HPRD" source_id="00252" uniprot="P27824"
>  refseq="NP_001737.1">
>                      <name>Calnexin</name>
>                      <interactor type="direct" pubmed_id="8136357">
>                           <molecule source_id="00127" gene_symbol="IFNGR1"
>  uniprot="P15260" refseq="">Interferon gamma
>  receptor 1</molecule>
>                      </interactor>
>       </direct_interaction>
>       <metabolic_interaction>
>                 <entry source_id="hsa:55832">
>                      <gene_name>CAND1</gene_name>
>                      <defination>cullin-associated and
>  neddylation-dissociated 1</defination>
>                      <orthology></orthology>
>                      <class></class>
>                      <enzyme></enzyme>
>                 </entry>
>                <entry source_id="ENSG00000111530-MONOMER"></entry>
>            </metabolic_interaction>
>  </LOCATE_protein>
>       ....
>       ....
>       .....
>  </LOCATE_interaction>
>
>
>
>   m_ConfigFileParser->parse( configFile.c_str() );
>
>        DOMDocument* xmlDoc = m_ConfigFileParser->getDocument();
>
>        DOMElement* elementRoot = xmlDoc->getDocumentElement();
>        if( !elementRoot ) throw(std::runtime_error( "empty XML document"
>  ));
>      DOMNodeList*      children = elementRoot->getChildNodes();
>
>        cout << "Total Locates Proteins : " << children->getLength() <<
>  endl;
>
>       for( XMLSize_t xx = 0; xx < children->getLength(); ++xx )
>        {
>           DOMNode* currentNode = children->item(xx);
>           if( currentNode->getNodeType() == DOMNode::ELEMENT_NODE )
>           {
>              // Found node which is an Element. Re-cast node as element
>              DOMElement* currentElement
>                          = dynamic_cast< xercesc::DOMElement* >(
>  currentNode );
>           //cout << currentElement << endl;
>              if(
>  XMLString::equals(currentElement->getTagName(),TAG_locateProtein))
>              {
>                 // Already tested node as type element and of name
>  "ApplicationSettings".
>                 // Read attributes of element "ApplicationSettings".
>             const XMLCh* xmlch_locateID
>                      = currentElement->getAttribute(ATTR_locateID);
>                m_locateID = XMLString::transcode(xmlch_locateID);
>
>             const XMLCh* xmlch_locateUniprotID
>                   = currentElement->getAttribute(ATTR_locateUniprotID);
>             m_locateUniprotID = XMLString::transcode(xmlch_locateUniprotID);
>
>             const XMLCh* xmlch_locateRefseqID
>                   = currentElement->getAttribute(ATTR_locateRefseqID);
>             m_locateRefseqID = XMLString::transcode(xmlch_locateRefseqID);
>
>             cout << "Locate ID:"
>                  << m_locateID
>                  << "|UniprotID:"
>                  << m_locateUniprotID
>                  << "|RefseqID:"
>                  << m_locateRefseqID
>                  << endl;
>
>             DOMNode* currentChild=currentNode->getFirstChild();
>             cout << currentChild->getTextContent() << endl;
>             cout <<
>  XMLString::transcode(currentNode->getFirstChild()->getNodeName())
>                  << endl;
>       }
>      }
>  }
>
>
>
> --
> Mr. Gaurav Kumar
> PhD Student (Bioinformatics/Computational Biology)
>
>


-- 
Mr. Gaurav Kumar
PhD Student (Bioinformatics/Computational Biology)

RE: Parsing Nested XML tags in Xercers-C

Reply via email to