Re: [due diligence] std.xml

Michel Fortin Tue, 19 Oct 2010 15:20:21 -0700

On 2010-10-19 16:43:04 -0400, sybrandy <sybra...@gmail.com> said:

I guess one question we need to ask is what do we expect from thislibrary? Do we want a full DOM implementation or is a SAX parser goodenough? Or do we need something in between? In PHP or Perl, perhapsboth, I saw a library where an XML document was essentially transformedinto nested associative arrays. It made it very easy to read data fromthe XML, however I don't know how much of the official standards itcomplied with.

Many people have different needs for XML, it's hard to come withsomething that pleases everyone. I might have the solution to thathowever: a template that makes it easy to implement any kind of parser.

I've made two xml modules a little while ago. The first is a tokenizertemplate that can work either as a pull-parser or callback-parser, oreven a mix of both, and is reentrant (you can invoke the tokenizerinside a callback to parse new tokens). The implementation has beenwritten based on the XML spec so I'm confident that the parser ispretty much standard. In regard to the standard, the tokenizer lackssupport for DTD internal subsets and user-defined character entities,and leaves some well-formness checks to the upper layers (like checkingif tag name matches) where it should be less costly for those checks tohappen.

The second module is a basic tree model based on the tokenizer. Itdoesn't try to be DOM-conformant, but it shows how the tokenizer can beused and implements the higher-level well-formness checks (matching tagnames). Building a SAX parser on top of the tokenizer would be a pieceof cake too.

It might be incomplete, but this code works: it's already in productionin a small program (script?) of mine. I don't really have the time towork on it at the moment, but if anyone wants to take it and improveupon it, then it could probably become Phobos's XML parser. One thingthat should be done is make the tokenizer accept ranges, something Istarted a couple of months ago but which I never finished.

Here's the (slightly outdated) documentation. If someone wants toproceed I'll extract the code from the rest of my code and release itunder the boost license.


http://michelf.com/docs/d/mfr/xmltok.html
http://michelf.com/docs/d/mfr/xml.html

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: [due diligence] std.xml

Reply via email to