Re: Phobos Proposal: replace std.xml with kxml.

Andrei Alexandrescu Tue, 04 May 2010 16:45:14 -0700

Michel Fortin wrote:

On 2010-05-04 12:09:29 -0400, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:
Graham Fawcett wrote:
By "adapt" do you mean writing a wrapper for an existing library, ortranslating the source code of the library into D?What constitutes a "generous license" in this context? (For what it'sworth, libxml2 is under the MIT License.)
Graham
We'd need to modify the code. I haven't looked into available xmllibraries so I don't know which would be eligible.
I think if you wanted to port an XML library to make use of ranges, theonly viable option is probably to find one based on C++ iterators.Otherwise it'll look more like a rewrite than a port, and at this pointwhy not write one from scratch?

Design is also a considerable time expense, though I agree that use ofranges may actually improve the design too.

Anyway, just in case, would you be interested in an XML tokenizer andsimple DOM following this model?
    http://michelf.com/docs/d/mfr/xmltok.html
    http://michelf.com/docs/d/mfr/xml.html
At the base is a pull parser and an event parser mixed in the samefunction template: "tokenize", allowing you to alternate betweeneven-based and pull-parsing at will. I'm using it, but its developmentis on hold at this time, I'm just maintaining it so it compiles on thenewest versions of DMD.


Sounds great, but I need to defer XML expertise to others.

The only thing it doesn't parse at this time is inline DTDs inside thedoctype.
Also, it currently only works only with strings, for simplicity andperformance. There is one issue about non-string parsing: when parsing astring, it's easy to just slice the string and move it around, but ifyou're parsing from a generic input range, you basically have to copycharacters one by one, which is much less efficient. So ideally thealgorithm should use slices whenever it can (when the input is a string).
I'm not sure yet how to attack this problem, but I'm thinking thatperhaps parsing primitives should be "part of" the range interface. Isay this in the sense that a range should provide specializedimplementation of primitive when it can implement them more efficiently(like by slicing). You wrote a while ago about designing parsingprimitives, is this part of Phobos now?
Anyway, the problem above is probably the one reason we might want towrite the parser from scratch: it needs to bind to specializablehigher-level parsing functions to take advantage of the performancecharacteristics of certain ranges, such as those you can slice.

There are a number of issues. One is that you should allow wchar anddchar in addition to char as basic character types (probably ubyte toofor exotic encodings). In essence the char type should be a templateparameter. The other is that perhaps you could be able to use zero-basedslices, i.e. s[0 .. i] as opposed to arbitrary slices s[i .. j]. Azero-based slice can be supported better than an arbitrary one.



Andrei

Re: Phobos Proposal: replace std.xml with kxml.

Reply via email to