i wrote a simple sax-style xml parser[1][2] for my own needs, and decided to share it. it has two interfaces: `xmparse()` function which simply calls callbacks without any validation or encoding conversion, and `SaxyEx` class, which does some validation, converts content to utf-8 (from anything std.encoding supports), and calls callbacks when the given path is triggered.

it can parse any `char` input range, or std.stdio.File. parsing files is probably slightly faster than parsing ranges.

internally it is extensively reusing memory buffers it allocated, so it should not create a big pressure on GC.

you are expected to copy any data you need in callbacks (not just slice, but .dup!).

so far i'm using it to parse fb2 files, and it parsing 8.5 megabyte utf-8 file (and creating internal reader structures, including splitting text to words and some other housekeeping) in one second on my i3 (with dmd -O, even without -inline and -release).

it is not really documented, but i think it is "intuitive". there are also some comments in source code; please, read those! ;-)

p.s. it decodes standard xml entities (&# and &#x probably works right only in utf-8 files, though), understands CDATA and comments.


enjoy, and happy hacking!


[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests

Reply via email to