On 29/06/2010 13:27, Michel Fortin wrote:
On 2010-06-29 04:41:50 -0400, Alix Pexton
<alix.dot.pex...@gmail.dot.com> said:

On 28/06/2010 15:11, Steven Schveighoffer wrote:

Yes, I don't think the phobos solution needs to mimic exactly the API of
SAX or DOM, the author should be free to use D idioms. But starting with
a common proven design is probably a good idea.

-Steve

I've been thinking about it, and while I believe you when you say that
SAX can be used to build the DOM, I'm not convinced that SAX is the
lowest common abstraction.

Michel Fortin's Tokenizer/Range seems much closer to the metal to me.

It is closer to the metal, but there's a catch...

One issue with SAX is that you must allocate an array of strings to pass
the attributes of an element, which is probably going to need a dynamic
allocation at some point. A lower-level abstraction such as mine (or
Tango's pull-parser) just returns each attribute as a separate token as
it parses them.

The downside of the tokenizer interface is that it only checks for a
subset of well-formness, for instance it doesn't check that tags balance
each other correctly or that there is no two attributes with the same
name. It's just a "tokenizer" after all, it can't be described as a
conformant XML parser by itself. The upper layer parser needs to check
for these things. My mini DOM built on this tokenizer does these checks
when using the tokenizer, and it's more efficient to do them there
because that's where the context information is kept, which is why the
tokenizer doesn't do them.

Implementing SAX on top of my tokenizer consists mostly of ensuring
proper tag balancing, checking for duplicate attributes, and collecting
attributes in an array (or another kind of list) you can then give to
the openElement SAX callback.


My understanding was that SAX _doesn't_ check those things either and that it was up to the code responding to the events to tackle wellformedness. After all, if SAX handled wellformedness, there would be no need for it to pass an argument to closeElement to state what element was being closed. SAX has its place though, when it comes to doing a single pass filter on a stream of XML that can be assumed to be wellformed, its simplicity is admittedly hard to beat. In other applications, however, there is much room for improvement. SAXplus, with a built in element memoisation, an element stack and a used id list sounds quite useful to me, as long as they remain optional of course.

Admittedly, my initial disappointment when looking into SAX means that it is something that I have not followed for some time.

Hmn, I suddenly just got nostalgic for the days when XML was all shiney and new and everyone was writing their own APIs or butchering old SGML/HTML tech. Makes me want to go look at my old code ^^

A...

Reply via email to