On Saturday, February 10, 2018 16:14:41 Jacob Carlborg via Digitalmars-d-
announce wrote:
> On 2018-02-09 22:15, Jonathan M Davis wrote:
> > Currently, dxml contains only a range-based StAX / pull parser and
> > related helper functions, but the plan is to add a DOM parser as well
> > as two writers - one which is the writer equivalent of a StaX parser,
> > and one which is DOM-based. However, in theory, the StAX parser is
> > complete and quite useable as-is - though I expect that I'll be adding
> > more helper functions to make it easier to use, and if you find that
> > you're doing a particular operation with it frequently and that that
> > operation is overly verbose, please point it out so that maybe a helper
> > function can be added to improve that use case - e.g.
> This is great news! Have you run any benchmarks to see how it performs?

Kind of. I did some benchmarking to see if some code changes would improve
performance, but I haven't tried benchmarking it against any other XML
libraries. That would take a fair bit of time and effort, and IMHO, that
would be better spent finishing the library first. Also, ldc's latest
release is only up to dmd 2.077.1, and dxml needs an improvement that got
added to byCodeUnit in 2.078.0, so any benchmarking that wants to do
something like compare dxml with a C/C++ parsing library while taking the
optimizer out of the equation isn't going to work yet unless I fork
byCodeUnit for dxml until we get another release of ldc.

One result of the benchmarking that I did do allowed me to simplify the code
quite a bit though. I'd originally had it be configurable whether the parser
kept track of the line number and column of the document, just the line
number, or neither on the theory that I really wanted access to the position
in the document in error messages but that it would affect performance, so
it should be configurable. However, benchmarking showed that it had
negligible impact on performance to the point that different PositionTypes
won out depending on the file and the particular run of the program,
indicating that that extra complexity was buying me nothing. There were a
fair number of static ifs to deal with that configuration option, so as soon
as I was able to measure that they didn't matter particularly, I removed
that option from the Config and all of its associated static ifs in the
parser and was able to reduce the complexity of the code a fair bit. Testing
that bit was actually the main reason that I did any benchmarking before
releasing anything, since I wanted to avoid changing the API later if I
could.

I am going to need to spend more time benchmarking code changes at some
point here though to see if I can make the parser faster, and eventually, I
will probably benchmark it against other parsing libraries. I fully expect
that it will compare favorably given that it does almost no heap allocations
and slices everything, but there's every possibility that I did something
algorithmically internally that hurts performance more than it should - e.g.
while it tries to parse everything only once, there are a few places where
it ends up taking a second pass over a piece of text, and refactoring that
is on my todo list (though most of the other potential improvements I did
benchmark were a wash, so I may find that it doesn't matter much).

I'll probably be in more of a hurry to benchmark dxml against other parsing
libraries if my dconf talk proposal on it gets accepted, since that's the
sort of thing that should probably be in such a talk.

I haven't even taken the time yet to figure out which libraries it should be
benchmared against.

- Jonathan M Davis

Reply via email to