On Wed, Mar 5, 2008 at 9:43 PM, Shachar Shemesh <[EMAIL PROTECTED]> wrote:
> Gilad Ben-Yossef wrote:
>
>  > Amos Shapira wrote:
>  >
>  >>
>  >> What do people around here like to use for EFFICIENT XML parsing?
>  >>
>  >
>  > Isn't Efficient XML an oxymoron?
>  >
>  > Seriously, and despite the flame bait way I've introduced the subject,
>  > if you need to do XML parsing in a way which is more efficient then
>  > Xerces,  maybe it is an indication that XML is a not a proper way to
>  > encode you r data.
>  I'll bite.

Thanks to everyone for your answers.

I'm replying to Shachar's reply because his is the closest to what I
have to add to this, plus some more info about my question as I
learned since I sent it.

>
>  Without knowing Xerces too deeply, I think you can do MUCH faster than
>  it, by feeding the schema before hand. Theoretically (though, the last

Xerces is apparently "the Lincoln of XML parsers" i.e. it supports
everything there is to support in the standard but it comes with a
huge weight attached to it. On my desktop it's the 9th largest library
at almost 4Mb, comes just before libkhtml and twice the size of libc.
But library size is not all I can say against it - it adheres to the
standard approach of DOM (tons of object, lots of memory) or SAX (i.e.
have to manually handle each event in the code which uses SAX).

There are a few newer approaches to parse XML files, there is a pretty
good list at http://en.wikipedia.org/wiki/Xml_parser#Processing_XML_files

The one that appeals the most to me is "Data Binding"
(http://en.wikipedia.org/wiki/Xml_parser#Data_binding), i.e., as
Shachar describes below - it's based on a program which reads the
schema and builds code (in my case, C++ class) which reads files of
this specific schema, its objects are strongly-typed in-memory
representations of the data in the XML file and provide convenient
accessors.

Presumebly, because these classes are schema-specific, they can cut a
lot of checks for irrelevant execution paths.

If you ever wrote XDR/RPC stuff (I'm talking about the stuff the NFS
and friends uses for network-level representation) then it might be
something similar - it used to have a program to convert language
independent data representation to various language-specific
implementations of classes to marshal and demarshal data (only I
forgot the name of the XDR compiler right now).

The snag about Data Binding is that all the implementations I found so
far are either for Java or Proprietary and cost a fortune (thousands
of dollars per developer seat, where you have to buy a license for
every developer who links his code with the output of the programs).

Ah - and our final programs (the ones we ship to customers) have to
support all sorts of UNIX variants, and Windows, not just Linux.

The only one which keeps our hopes alive is xmlbeanscxx
(http://xmlbeansxx.touk.pl/). I'm struggling with getting it to
compile and run for now.

Another one is CodeSynthesis XSD
(http://www.codesynthesis.com/products/xsd/), it's GPL so we we can't
link it with our proprietary code.

Here is a pretty complete list of XML Data Binding resources, almost
all options for C/C++ are commercial:
http://www.rpbourret.com/xml/XMLDataBinding.htm

Thanks again for everyone's input.

--Amos

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to