SAX is efficient: you handle document contents in a stream and act on
whatever interests you when you see it.  However, you may have to
maintain some state to know whether a particular element is of interest.
For example, if you want to extract the addresses of customers, you need
to know whether the <address> you're currently processing is the child
of a <customer> rather than, say, a <vendor>.

DOM is easy: because the entire document is represented as a tree in
memory, you can query it.  In the example above, you'd look for
<customer> elements, and then process each one's <address> element.  The
downside is that DOM is comparatively inefficient: it takes time and
space to build the DOM tree.

Either way, attributes are accessible.  How you get at them depends on
which approach you take, of course.

If the sample applications don't convey enough to get you going, I'd
recommend finding a tutorial on XML, including SAX and/or DOM.  There's
a lot to understand, including issues of document encoding,
well-formedness versus validity, DTD versus XML Schema (and other
alternatives not supported by Xerces like Relax-NG), namespaces, and
more.  In other words, there's good reason to be overwhelmed.  While XML
is conceptually simple, it's complex in practice.  This list can help
with specific questions, but there are better places to get the core
concepts.  I'm partial to reading the W3C's specifications, but while
they're authoritative and complete, they can be hard to comprehend, so
I'd recommend looking for a good book or online tutorial.

As for working with strings: XML documents are made up of text, so
that's what the APIs deal in.  If you use XML Schema to validate
documents, you can specify what data types attribute values and element
contents are intended to represent.  You'll still have to deal with
string representations of native types, but if you validate, you'll at
least know that the string is a valid representation of the type.  That
said, there's no guarantee that a given machine has a native type that
can store the represented value.  For instance, a Schema integer can
have any integral value; it's not limited to 2^32 or 2^64 or any other
size, though implementations are only required to support 18 digits.
See http://www.w3.org/TR/xmlschema-2/#integer.


-----Original Message-----
From: Adrian Schubert [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 06, 2008 5:12 AM
To: [email protected]
Subject: Which parser / methods for non-string types?

Hi everyone,

Scanning the posts here, and having done some googling, I have to say 
I'm feeling a bit overwhelmed at the moment.
I'm very new at this whole XML/DOM/SAX terminology...!

It's very simple, actually: I need to write a program to read in a 
number of parameters from an XML file describing a particular satellite 
data format.
The XML files I'm talking about typically contain ~8000 elements, but 
I'll only need to extract a couple of dozen specific elements or so, 
buried in various places in the file.
My target platform is linux, GNU C++ compiler.
The idea: read various specific XML elements and attributes into my own 
data structures, so I can work with them.

Question (1):
What's the best way to extract a particular element value, where only 
its name is known?
And can I extract attribute values as easily as element values?

Question (2):
The data I need to extract can be strings, short/long integers, or 
single/double floats.
All the code examples I've seen so far have treated elements as strings.
What's the most elegant way to read in values of various types?
Do I really have to do after-the-fact conversions, for example using 
atoi()? That seems a bit odd.


With all that said, what parser should I use? DOM? SAX2? And why?
I've already managed to run a few of the xerces test programs on one of 
my XML files; it parses fine, I can count the elements, etc.
So at least I have some kind of ground to work on.

Can anyone here provide me with an example(s), esp. of how to extract 
specific element values of non-string type?

I hope I was clear...thank you for all help!
Adrian

Reply via email to