On 28/06/2010 13:04, Steven Schveighoffer wrote:
On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospam...@gmail.com> wrote:

I did a simple implementation of a pull parser, using this API as
reference: http://xmlpull.org/

But I used a iterator similar to the one used by Steve (from
dcollections) to parse the doc. It turns out that Tango did something
similar first (using iterator to parse the document), and seeing the
debacle caused by the Date module, I think it would be a bad idea to
release it.

Did you look at Tango's code in question, or look at their
documentation? If not, then you are fine.

I think any implementation is going to have to at least try to use
ranges or show why they are not a good idea for xml, since Andrei is set
on using ranges for everything.

BTW, I've not used std.xml or tango's xml, but I agree that an xml
library is a very important part of today's standard libraries. Having
xml in the standard allows for so much usage of it in many other places
(serialization comes to mind immediately). If std.xml is bad (which I've
heard from several independent people), then throw it out and make
something new.

I myself have tried to think of how xml can be done with ranges, but I
believe one of the key elements is it has to parse xml without loading
the entire document to be efficient enough for some applications. A DOM
style parser which presents a range interface is probably fine, but a
lazy interface would be the best. Since XML is a tree style, you need a
range which allows moving down the tree. You almost need a stacking
range which can move down the tree and also to the next sibling element.
Ideally, the library should do as much as possible without allocating
anything but buffer space to read data.

-Steve

I've not looked at any of the D XML offerings (shame on me?) but I've been having a bit of a look at the types of API that are available in other languages, and there seems to be 3...

Event based a la SAX

Stream based a la StAX

Tree based a la "the" DOM

The simple conclusion that I have drawn is that the is no one-size-fits-all solution, and that it would therefore be a mistake to put all effort into supporting only one. (However, ranges do seem to match up quite nicely with the way that the Stream based APIs operate.)

It would seem to me most logical to consider the many varied use-cases and build a core API upon which all 3 types of XML processor can be built (or at least specify a core set of types to be used by all 3), rather than focus on implementing one particular style. Interoperability of all 3 styles would then be possible and perhaps facilitate the later implementation of higher abstractions (such as XPath and XQuery).

I think it is also important to remember that there are at least 4 different stages to processing XML (reading, validating, mutating, writing) and that many programming tasks allow one or more of these aspects to be ignored. This can mean that one programmer is blinded to the requirements of another in a different domain because the ways in which they work with XML either overlap only partially or not at all.

I've never used anything like SAX myself, though I have used the DOM quite a lot, and spent most of the time wishing it worked a bit more like StAX (even though I hadn't heard of StAX at the time ^^).

What ever is done for D, it should allow programmers to work with XML in a way that is familiar to them and compatible with what others do. Memory should be used conservatively, and reprocessing (parsing the same portion of a document multiple times) should be minimised.

Most importantly, the implementation should be D-ey, rather that the abstraction used in any other language's most favoured solution, shoehorned into a D-shaped box.

A...
(whose 2 cents are worth no more or no less than anyone else's.)

Reply via email to