simple sax-style xml parser

ketmar via Digitalmars-d-announce Tue, 19 Jul 2016 18:52:12 -0700

i wrote a simple sax-style xml parser[1][2] for my own needs, anddecided to share it. it has two interfaces: `xmparse()` functionwhich simply calls callbacks without any validation or encodingconversion, and `SaxyEx` class, which does some validation,converts content to utf-8 (from anything std.encoding supports),and calls callbacks when the given path is triggered.

it can parse any `char` input range, or std.stdio.File. parsingfiles is probably slightly faster than parsing ranges.

internally it is extensively reusing memory buffers it allocated,so it should not create a big pressure on GC.

you are expected to copy any data you need in callbacks (not justslice, but .dup!).

so far i'm using it to parse fb2 files, and it parsing 8.5megabyte utf-8 file (and creating internal reader structures,including splitting text to words and some other housekeeping) inone second on my i3 (with dmd -O, even without -inline and-release).

it is not really documented, but i think it is "intuitive". thereare also some comments in source code; please, read those! ;-)

p.s. it decodes standard xml entities (&# and &#x probably worksright only in utf-8 files, though), understands CDATA andcomments.



enjoy, and happy hacking!


[1] http://repo.or.cz/iv.d.git/blob_plain/HEAD:/saxy.d
[2] http://repo.or.cz/iv.d.git/tree/HEAD:/saxytests

simple sax-style xml parser

Reply via email to