I have multiple projects that need an XML parser, and std_experimental_xml
is clearly going nowhere, with the guy who wrote it having disappeared into
the ether, so I decided to break down and write one. I've kind of wanted to
for years, but I didn't want to spend the time on it. However, sometime last
year I finally decided that I had to, and it's been what I've been working
on in my free time for a while now. And it's finally reached the point when
it makes sense to release it - hence this post.

Currently, dxml contains only a range-based StAX / pull parser and related
helper functions, but the plan is to add a DOM parser as well as two writers
- one which is the writer equivalent of a StaX parser, and one which is
DOM-based. However, in theory, the StAX parser is complete and quite useable
as-is - though I expect that I'll be adding more helper functions to make it
easier to use, and if you find that you're doing a particular operation with
it frequently and that that operation is overly verbose, please point it out
so that maybe a helper function can be added to improve that use case - e.g.
I'm thinking of adding a function similar to std.getopt.getopt for handling
attributes, because I personally find that dealing with those is more
verbose than I'd like. Obviously, some stuff is just going to do better with
a DOM parser, but thus far, I've found that a StAX parser has suited my
needs quite well. I have no plans to add a SAX parser, since as far as I can
tell, SAX parsers are just plain worse than StAX parsers, and the StAX
approach is quite well-suited to ranges.

Of note, dxml does not support the DTD section beyond what is required to
parse past it, since supporting it would make it impossible for the parser
to return slices of the original input beyond the case where strings are
used (and it would be forced to allocate strings in some cases, whereas dxml
does _very_ minimal heap allocation right now), and parsing the DTD section
signicantly increases the complexity of the parser in order to support
something that I honestly don't think should ever have been part of the XML
standard and is unnecessary for many, many XML documents. So, if you're
dealing with XML documents that contain entity references that are declared
in the DTD section and then used outside of the DTD section, then dxml will
not support them, but it will work just fine if a DTD section is there so
long as it doesn't declare any entity references that are then referenced in
the document proper.

Hopefully, the documentation is clear enough, but obviously, I'm not the
best judge of that. So, have at it.

Documentation: http://jmdavisprog.com/docs/dxml/0.1.0/
Github: https://github.com/jmdavis/dxml
Dub: http://code.dlang.org/packages/dxml

- Jonathan M Davis

Reply via email to