Michel Fortin wrote:
On 2010-05-04 12:09:29 -0400, Andrei Alexandrescu <seewebsiteforem...@erdani.org> said:

Graham Fawcett wrote:
By "adapt" do you mean writing a wrapper for an existing library, or translating the source code of the library into D? What constitutes a "generous license" in this context? (For what it's worth, libxml2 is under the MIT License.)

Graham

We'd need to modify the code. I haven't looked into available xml libraries so I don't know which would be eligible.

I think if you wanted to port an XML library to make use of ranges, the only viable option is probably to find one based on C++ iterators. Otherwise it'll look more like a rewrite than a port, and at this point why not write one from scratch?

Design is also a considerable time expense, though I agree that use of ranges may actually improve the design too.

Anyway, just in case, would you be interested in an XML tokenizer and simple DOM following this model?

    http://michelf.com/docs/d/mfr/xmltok.html
    http://michelf.com/docs/d/mfr/xml.html

At the base is a pull parser and an event parser mixed in the same function template: "tokenize", allowing you to alternate between even-based and pull-parsing at will. I'm using it, but its development is on hold at this time, I'm just maintaining it so it compiles on the newest versions of DMD.

Sounds great, but I need to defer XML expertise to others.

The only thing it doesn't parse at this time is inline DTDs inside the doctype.

Also, it currently only works only with strings, for simplicity and performance. There is one issue about non-string parsing: when parsing a string, it's easy to just slice the string and move it around, but if you're parsing from a generic input range, you basically have to copy characters one by one, which is much less efficient. So ideally the algorithm should use slices whenever it can (when the input is a string).

I'm not sure yet how to attack this problem, but I'm thinking that perhaps parsing primitives should be "part of" the range interface. I say this in the sense that a range should provide specialized implementation of primitive when it can implement them more efficiently (like by slicing). You wrote a while ago about designing parsing primitives, is this part of Phobos now?

Anyway, the problem above is probably the one reason we might want to write the parser from scratch: it needs to bind to specializable higher-level parsing functions to take advantage of the performance characteristics of certain ranges, such as those you can slice.

There are a number of issues. One is that you should allow wchar and dchar in addition to char as basic character types (probably ubyte too for exotic encodings). In essence the char type should be a template parameter. The other is that perhaps you could be able to use zero-based slices, i.e. s[0 .. i] as opposed to arbitrary slices s[i .. j]. A zero-based slice can be supported better than an arbitrary one.


Andrei

Reply via email to