Re: Status of std.xml (D2/Phobos)

Alix Pexton Mon, 28 Jun 2010 07:01:06 -0700

On 28/06/2010 13:04, Steven Schveighoffer wrote:

On Sun, 27 Jun 2010 14:56:21 -0400, Yao G. <nospam...@gmail.com> wrote:

I did a simple implementation of a pull parser, using this API as
reference: http://xmlpull.org/

But I used a iterator similar to the one used by Steve (from
dcollections) to parse the doc. It turns out that Tango did something
similar first (using iterator to parse the document), and seeing the
debacle caused by the Date module, I think it would be a bad idea to
release it.


Did you look at Tango's code in question, or look at their
documentation? If not, then you are fine.

I think any implementation is going to have to at least try to use
ranges or show why they are not a good idea for xml, since Andrei is set
on using ranges for everything.

BTW, I've not used std.xml or tango's xml, but I agree that an xml
library is a very important part of today's standard libraries. Having
xml in the standard allows for so much usage of it in many other places
(serialization comes to mind immediately). If std.xml is bad (which I've
heard from several independent people), then throw it out and make
something new.

I myself have tried to think of how xml can be done with ranges, but I
believe one of the key elements is it has to parse xml without loading
the entire document to be efficient enough for some applications. A DOM
style parser which presents a range interface is probably fine, but a
lazy interface would be the best. Since XML is a tree style, you need a
range which allows moving down the tree. You almost need a stacking
range which can move down the tree and also to the next sibling element.
Ideally, the library should do as much as possible without allocating
anything but buffer space to read data.

-Steve

I've not looked at any of the D XML offerings (shame on me?) but I'vebeen having a bit of a look at the types of API that are available inother languages, and there seems to be 3...


Event based a la SAX

Stream based a la StAX

Tree based a la "the" DOM

The simple conclusion that I have drawn is that the is noone-size-fits-all solution, and that it would therefore be a mistake toput all effort into supporting only one. (However, ranges do seem tomatch up quite nicely with the way that the Stream based APIs operate.)

It would seem to me most logical to consider the many varied use-casesand build a core API upon which all 3 types of XML processor can bebuilt (or at least specify a core set of types to be used by all 3),rather than focus on implementing one particular style. Interoperabilityof all 3 styles would then be possible and perhaps facilitate the laterimplementation of higher abstractions (such as XPath and XQuery).

I think it is also important to remember that there are at least 4different stages to processing XML (reading, validating, mutating,writing) and that many programming tasks allow one or more of theseaspects to be ignored. This can mean that one programmer is blinded tothe requirements of another in a different domain because the ways inwhich they work with XML either overlap only partially or not at all.

I've never used anything like SAX myself, though I have used the DOMquite a lot, and spent most of the time wishing it worked a bit morelike StAX (even though I hadn't heard of StAX at the time ^^).

What ever is done for D, it should allow programmers to work with XML ina way that is familiar to them and compatible with what others do.Memory should be used conservatively, and reprocessing (parsing the sameportion of a document multiple times) should be minimised.

Most importantly, the implementation should be D-ey, rather that theabstraction used in any other language's most favoured solution,shoehorned into a D-shaped box.


A...
(whose 2 cents are worth no more or no less than anyone else's.)

Re: Status of std.xml (D2/Phobos)

Reply via email to