On 2015-05-04 21:14, Jonathan M Davis wrote:

If I were doing it, I'd do three types of parsers:

1. A parser that was pretty much as low level as you can get, where you
basically a range of XML atributes or tags. Exactly how to build that
could be a bit entertaining, since it would have to be hierarchical, and
ranges aren't, but something like a range of tags where you can get a
range of its attributes and sub-tags from it so that the whole document
can be processed without actually getting to the level of even a SAX
parser. That parser could then be used to build the other parsers, and
anyone who needed insanely fast speeds could use it rather than the SAX
or DOM parser so long as they were willing to pay the inevitable loss in
user-friendliness.

2. SAX parser built on the low level parser.

3. DOM parser built either on the low level parser or the SAX parser
(whichever made more sense).

I doubt that I'm really explaining the low level parser well enough or
have even though through it enough, but I really think that even a SAX
parser is too high level for the base parser and that something that
slightly higher than a lexer (high enough to actually be processing XML
rather than individual tokens but pretty much only as high as is
required to do that) would be a far better choice.

IIRC, Michel Fortin's work went in that direction, and he linked to his
code in another post, so I'd suggest at least looking at that for ideas.

This way the XML parser is structured in Tango. A pull parser at the lowest level, a SAX parser on top of that and I think the DOM parser builds on top of the pull parser.

The Tango pull parser can give you the following tokens:

* start element
* attribute
* end element
* end empty element
* data
* comment
* cdata
* doctype
* pi

--
/Jacob Carlborg

Reply via email to