On 2010-10-19 16:43:04 -0400, sybrandy <sybra...@gmail.com> said:
I guess one question we need to ask is what do we expect from this library? Do we want a full DOM implementation or is a SAX parser good enough? Or do we need something in between? In PHP or Perl, perhaps both, I saw a library where an XML document was essentially transformed into nested associative arrays. It made it very easy to read data from the XML, however I don't know how much of the official standards it complied with.
Many people have different needs for XML, it's hard to come with something that pleases everyone. I might have the solution to that however: a template that makes it easy to implement any kind of parser.
I've made two xml modules a little while ago. The first is a tokenizer template that can work either as a pull-parser or callback-parser, or even a mix of both, and is reentrant (you can invoke the tokenizer inside a callback to parse new tokens). The implementation has been written based on the XML spec so I'm confident that the parser is pretty much standard. In regard to the standard, the tokenizer lacks support for DTD internal subsets and user-defined character entities, and leaves some well-formness checks to the upper layers (like checking if tag name matches) where it should be less costly for those checks to happen.
The second module is a basic tree model based on the tokenizer. It doesn't try to be DOM-conformant, but it shows how the tokenizer can be used and implements the higher-level well-formness checks (matching tag names). Building a SAX parser on top of the tokenizer would be a piece of cake too.
It might be incomplete, but this code works: it's already in production in a small program (script?) of mine. I don't really have the time to work on it at the moment, but if anyone wants to take it and improve upon it, then it could probably become Phobos's XML parser. One thing that should be done is make the tokenizer accept ranges, something I started a couple of months ago but which I never finished.
Here's the (slightly outdated) documentation. If someone wants to proceed I'll extract the code from the rest of my code and release it under the boost license.
http://michelf.com/docs/d/mfr/xmltok.html http://michelf.com/docs/d/mfr/xml.html -- Michel Fortin michel.for...@michelf.com http://michelf.com/