RE: How to parse using DOM

Jesse Pelton Tue, 27 Nov 2007 10:55:15 -0800

Why do you need an integer?  You can just as easily loop on an XMLSize_t value:


  DOMNodeList *child_contents = root.getChildNodes();
  for (XMLSize_t index = 0; index < child_contents.getLength(); index++)
  {
    DOMNode * child = child_contents[index];

    // do something useful
  }

I show child_contents.getLength() being tested in the for() statement because 
DOMNodeLists are "live" and can change length if nodes are added or removed in 
the loop.  If you know for sure that your DOM won't change during processing, 
you can record the list's length outside the loop and save a function call on 
each iteration.  If the DOM can change, odds are that my simple-minded 
defensive code will just be the starting point for the code you'll need to 
handle the changes.  For instance, if you add a single node prior to the one 
you're currently processing, the "next" node you fetch will be the same as the 
current one unless you increment the index to take the addition into account.

But I use a different approach, as I suspect most other people do:

  DOMNode * child;
  for (child =  root->getFirstChild();
       child != NULL;
       child =  child->getNextSibling())
  {
    // do something useful
  }

This approach saves the time and space necessary used to create the 
DOMNodeList.  Since this sort of processing is often recursive, any 
optimization will be multiplied.  At least as important, I also find this 
easier to understand, so I'm more likely to get it right (or to notice if I 
haven't).  It assumes that the object pointed to by "child" is never removed 
while it is being processed.  If removal is possible, you'll need to fetch the 
next sibling before the child is removed.

-----Original Message-----
From: Javier Gálvez Guerrero [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 27, 2007 1:09 PM
To: [email protected]
Subject: Re: How to parse using DOM

Thank you again for all your answers. I hope you are not fed up with me
yet..haha.

Now I want to obtain an int (c++) in order to process all the children of
the root node in the DOM representation, so:

DOMNodeList *child_contents = root.getChildNodes();

With child_contents.getLength() I have a XMLSize_t value as a result. How
can I parse it to a int?

Thank you a lot again,
Javi


2007/11/27, Jesse Pelton <[EMAIL PROTECTED]>:
>
> http://xerces.apache.org/xerces-c/apiDocs/functions_0x67.html provides a
> list of methods implemented by Xerces-C; if you look at it, you'll find
> getDocument() is a method of AbstractDOMParser.  Click the method name and
> you'll find a brief description, including the fact that it returns a
> DOMDocument pointer. Click DOMDocument and you'll find that it has a
> getDocumentElement() method "that allows direct access to the child node
> that is the root element of the document."  Given this node, you can use
> getChildNodes(), getFirstChild(), getNextSibling(), and so on to directly
> navigate the DOM.
>
> Alternately, you can use getElementsByTagName() to obtain a list of
> elements with a given name or getElementById() to get an element with a
> unique ID.  Or use DOMTreeWalker to work with a subset of your
> document.  I've never done that, so it's left as an exercise for the
> student.
>
> -----Original Message-----
> From: Javier Gálvez Guerrero [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 27, 2007 10:46 AM
> To: [email protected]
> Subject: Re: How to parse using DOM
>
> Thank you all very much, specially to Sven who shared his own effort.
>
> However, I have looked into the samples on the repositories site and I
> can't
> find how to "extract" the data itself from the DOM tree. If you don't mind
> I
> would like to make some simple questions so with your answers I hope I
> could
> start typing code.
>
> //get the DOM representation
> DOMNode *doc = parser->getDocument();
>
> I can not find the getDocument method description in the provided
> documentation and I am quite confused about it. Anyway, *doc is supposed
> to
> be the DOM representation. So, what I need to do is to extract many
> elements
> (with their childs and attributes) from a XML file, which is supposed to
> be
> represented by *doc once it has been parsed and got. Then, how can I
> assign,
> let's say, the value of the "nickname" element inside its parent element
> "user"? getDocument returns the root node? So I guess I can ask it for its
> children and then "move" through the tree with methods of the DOMNode API,
> like for example, getNodeValue(), getChildNodes() and so on.
>
> Is it ok? Does it exist any other way to extract data from the DOM
> representation or this is the one about to use?
>
> Thank you all very much again and sorry for the inconvenience. I am really
> interested in using Xerces in the application I am developing, so that's
> whay I would like to know how to use it properly.
>
> Cheers,
> Javi
>
> **
>
>
>
> 2007/11/27, David Bertoni <[EMAIL PROTECTED]>:
> >
> > Sven Bauhan wrote:
> > > Hi Javi,
> > >
> > > the Xerces interface is not really intuitively. A short description
> can
> > be
> > > found at the DOM programming giude:
> > > http://xerces.apache.org/xerces-c/program.html
> > >
> > > In the Xerces documentation it is often described to use an extra
> class
> > for
> > > the conversion of std::string and XMLChar*. I have written such a
> class.
> > As
> > > it is quite short, I attach it here.
> > Your class uses XMLString::transcode(), which transcodes to the local
> code
> > page.  This will result in data loss in cases where content contains
> > Unicode characters that are not representable in the local code page.  A
> > better choice would be to transcode to UTF-8, which is compatible with
> > char* APIs, and has the advantage that it can represent any Unicode
> > character.
> >
> > There are many postings in the archives that will illustrate why using
> > XMLString::transcode() is a bad idea.  I wish we would actually modify
> the
> > analogous class in our samples so it doesn't do local code page
> > transcoding, as it's providing a bad example.
> >
> > Dave
> >
>

RE: How to parse using DOM

Reply via email to