Re: How to parse using DOM

Javier Gálvez Guerrero Tue, 27 Nov 2007 11:09:30 -0800

Nice!! Thanks a lot! Your approach is quite interesting; I'll see if it
meets my requirements.


I thought that XMLSize_t could not be treated as an index (as a common
integer). That known I can continue coding.

Cheers,
Javi


2007/11/27, Jesse Pelton <[EMAIL PROTECTED]>:
>
> Why do you need an integer?  You can just as easily loop on an XMLSize_t
> value:
>
>   DOMNodeList *child_contents = root.getChildNodes();
>   for (XMLSize_t index = 0; index < child_contents.getLength(); index++)
>   {
>     DOMNode * child = child_contents[index];
>
>     // do something useful
>   }
>
> I show child_contents.getLength() being tested in the for() statement
> because DOMNodeLists are "live" and can change length if nodes are added or
> removed in the loop.  If you know for sure that your DOM won't change during
> processing, you can record the list's length outside the loop and save a
> function call on each iteration.  If the DOM can change, odds are that my
> simple-minded defensive code will just be the starting point for the code
> you'll need to handle the changes.  For instance, if you add a single node
> prior to the one you're currently processing, the "next" node you fetch will
> be the same as the current one unless you increment the index to take the
> addition into account.
>
> But I use a different approach, as I suspect most other people do:
>
>   DOMNode * child;
>   for (child =  root->getFirstChild();
>        child != NULL;
>        child =  child->getNextSibling())
>   {
>     // do something useful
>   }
>
> This approach saves the time and space necessary used to create the
> DOMNodeList.  Since this sort of processing is often recursive, any
> optimization will be multiplied.  At least as important, I also find this
> easier to understand, so I'm more likely to get it right (or to notice if I
> haven't).  It assumes that the object pointed to by "child" is never removed
> while it is being processed.  If removal is possible, you'll need to fetch
> the next sibling before the child is removed.
>
> -----Original Message-----
> From: Javier Gálvez Guerrero [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 27, 2007 1:09 PM
> To: [email protected]
> Subject: Re: How to parse using DOM
>
> Thank you again for all your answers. I hope you are not fed up with me
> yet..haha.
>
> Now I want to obtain an int (c++) in order to process all the children of
> the root node in the DOM representation, so:
>
> DOMNodeList *child_contents = root.getChildNodes();
>
> With child_contents.getLength() I have a XMLSize_t value as a result. How
> can I parse it to a int?
>
> Thank you a lot again,
> Javi
>
>
> 2007/11/27, Jesse Pelton <[EMAIL PROTECTED]>:
> >
> > http://xerces.apache.org/xerces-c/apiDocs/functions_0x67.html provides a
> > list of methods implemented by Xerces-C; if you look at it, you'll find
> > getDocument() is a method of AbstractDOMParser.  Click the method name
> and
> > you'll find a brief description, including the fact that it returns a
> > DOMDocument pointer. Click DOMDocument and you'll find that it has a
> > getDocumentElement() method "that allows direct access to the child node
> > that is the root element of the document."  Given this node, you can use
> > getChildNodes(), getFirstChild(), getNextSibling(), and so on to
> directly
> > navigate the DOM.
> >
> > Alternately, you can use getElementsByTagName() to obtain a list of
> > elements with a given name or getElementById() to get an element with a
> > unique ID.  Or use DOMTreeWalker to work with a subset of your
> > document.  I've never done that, so it's left as an exercise for the
> > student.
> >
> > -----Original Message-----
> > From: Javier Gálvez Guerrero [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, November 27, 2007 10:46 AM
> > To: [email protected]
> > Subject: Re: How to parse using DOM
> >
> > Thank you all very much, specially to Sven who shared his own effort.
> >
> > However, I have looked into the samples on the repositories site and I
> > can't
> > find how to "extract" the data itself from the DOM tree. If you don't
> mind
> > I
> > would like to make some simple questions so with your answers I hope I
> > could
> > start typing code.
> >
> > //get the DOM representation
> > DOMNode *doc = parser->getDocument();
> >
> > I can not find the getDocument method description in the provided
> > documentation and I am quite confused about it. Anyway, *doc is supposed
> > to
> > be the DOM representation. So, what I need to do is to extract many
> > elements
> > (with their childs and attributes) from a XML file, which is supposed to
> > be
> > represented by *doc once it has been parsed and got. Then, how can I
> > assign,
> > let's say, the value of the "nickname" element inside its parent element
> > "user"? getDocument returns the root node? So I guess I can ask it for
> its
> > children and then "move" through the tree with methods of the DOMNode
> API,
> > like for example, getNodeValue(), getChildNodes() and so on.
> >
> > Is it ok? Does it exist any other way to extract data from the DOM
> > representation or this is the one about to use?
> >
> > Thank you all very much again and sorry for the inconvenience. I am
> really
> > interested in using Xerces in the application I am developing, so that's
> > whay I would like to know how to use it properly.
> >
> > Cheers,
> > Javi
> >
> > **
> >
> >
> >
> > 2007/11/27, David Bertoni <[EMAIL PROTECTED]>:
> > >
> > > Sven Bauhan wrote:
> > > > Hi Javi,
> > > >
> > > > the Xerces interface is not really intuitively. A short description
> > can
> > > be
> > > > found at the DOM programming giude:
> > > > http://xerces.apache.org/xerces-c/program.html
> > > >
> > > > In the Xerces documentation it is often described to use an extra
> > class
> > > for
> > > > the conversion of std::string and XMLChar*. I have written such a
> > class.
> > > As
> > > > it is quite short, I attach it here.
> > > Your class uses XMLString::transcode(), which transcodes to the local
> > code
> > > page.  This will result in data loss in cases where content contains
> > > Unicode characters that are not representable in the local code
> page.  A
> > > better choice would be to transcode to UTF-8, which is compatible with
> > > char* APIs, and has the advantage that it can represent any Unicode
> > > character.
> > >
> > > There are many postings in the archives that will illustrate why using
> > > XMLString::transcode() is a bad idea.  I wish we would actually modify
> > the
> > > analogous class in our samples so it doesn't do local code page
> > > transcoding, as it's providing a bad example.
> > >
> > > Dave
> > >
> >
>

Re: How to parse using DOM

Reply via email to