Re: Status of std.xml (D2/Phobos)

Michel Fortin Mon, 28 Jun 2010 11:51:04 -0700

On 2010-06-28 14:27:13 -0400, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:

Here's the generated documentation:
http://michelf.com/docs/d/mfr/xmltok.html
http://michelf.com/docs/d/mfr/xml.html

I'm slowly revamping it to use ranges instead of strings.
I think a tokenizer should be a higher-order range that is fed an inputrange of ubyte, char, wchar, or dchar (so that would be a typeparameter) and is itself a range of Tokens that include the token type,token value etc.

And I've implemented a tokenizer range just like you describe on top ofmy tokenizer function. Look at the documentation formfr.xmltok.XMLForwardRange. (I should probably rename it toXMLTokenRange.)

Personally, I prefer to use the callback approach which automaticallycalls the right function according to the token type. But what's niceabout my tokenizer is that you can do both callbacks and pull-styletokenization (the later can be wrapped in a range), and mix theseapproaches together as needed.

What is missing is taking arbitrary ranges as input (it deals withstrings currently). Strings are like the optimized case fortokenization because you don't have to dynamically allocate anything:referencing the original string is enough when making substrings. Witharbitrary ranges you have to copy the text and tag names to a stringone character at a time, which is less efficient. I don't want to writetwo separate parsers for this, so I'm trying to abstract things at theright level to maximize code reuse while keeping performance optimizedfor the string-as-input case, but how to do that is not so obvious.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: Status of std.xml (D2/Phobos)

Reply via email to