On 2010-06-28 14:27:13 -0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> said:
Here's the generated documentation:
http://michelf.com/docs/d/mfr/xmltok.html
http://michelf.com/docs/d/mfr/xml.html
I'm slowly revamping it to use ranges instead of strings.
I think a tokenizer should be a higher-order range that is fed an input
range of ubyte, char, wchar, or dchar (so that would be a type
parameter) and is itself a range of Tokens that include the token type,
token value etc.
And I've implemented a tokenizer range just like you describe on top of
my tokenizer function. Look at the documentation for
mfr.xmltok.XMLForwardRange. (I should probably rename it to
XMLTokenRange.)
Personally, I prefer to use the callback approach which automatically
calls the right function according to the token type. But what's nice
about my tokenizer is that you can do both callbacks and pull-style
tokenization (the later can be wrapped in a range), and mix these
approaches together as needed.
What is missing is taking arbitrary ranges as input (it deals with
strings currently). Strings are like the optimized case for
tokenization because you don't have to dynamically allocate anything:
referencing the original string is enough when making substrings. With
arbitrary ranges you have to copy the text and tag names to a string
one character at a time, which is less efficient. I don't want to write
two separate parsers for this, so I'm trying to abstract things at the
right level to maximize code reuse while keeping performance optimized
for the string-as-input case, but how to do that is not so obvious.
--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/