On Saturday, February 10, 2018 16:14:41 Jacob Carlborg via Digitalmars-d- announce wrote: > On 2018-02-09 22:15, Jonathan M Davis wrote: > > Currently, dxml contains only a range-based StAX / pull parser and > > related helper functions, but the plan is to add a DOM parser as well > > as two writers - one which is the writer equivalent of a StaX parser, > > and one which is DOM-based. However, in theory, the StAX parser is > > complete and quite useable as-is - though I expect that I'll be adding > > more helper functions to make it easier to use, and if you find that > > you're doing a particular operation with it frequently and that that > > operation is overly verbose, please point it out so that maybe a helper > > function can be added to improve that use case - e.g. > This is great news! Have you run any benchmarks to see how it performs?
Kind of. I did some benchmarking to see if some code changes would improve performance, but I haven't tried benchmarking it against any other XML libraries. That would take a fair bit of time and effort, and IMHO, that would be better spent finishing the library first. Also, ldc's latest release is only up to dmd 2.077.1, and dxml needs an improvement that got added to byCodeUnit in 2.078.0, so any benchmarking that wants to do something like compare dxml with a C/C++ parsing library while taking the optimizer out of the equation isn't going to work yet unless I fork byCodeUnit for dxml until we get another release of ldc. One result of the benchmarking that I did do allowed me to simplify the code quite a bit though. I'd originally had it be configurable whether the parser kept track of the line number and column of the document, just the line number, or neither on the theory that I really wanted access to the position in the document in error messages but that it would affect performance, so it should be configurable. However, benchmarking showed that it had negligible impact on performance to the point that different PositionTypes won out depending on the file and the particular run of the program, indicating that that extra complexity was buying me nothing. There were a fair number of static ifs to deal with that configuration option, so as soon as I was able to measure that they didn't matter particularly, I removed that option from the Config and all of its associated static ifs in the parser and was able to reduce the complexity of the code a fair bit. Testing that bit was actually the main reason that I did any benchmarking before releasing anything, since I wanted to avoid changing the API later if I could. I am going to need to spend more time benchmarking code changes at some point here though to see if I can make the parser faster, and eventually, I will probably benchmark it against other parsing libraries. I fully expect that it will compare favorably given that it does almost no heap allocations and slices everything, but there's every possibility that I did something algorithmically internally that hurts performance more than it should - e.g. while it tries to parse everything only once, there are a few places where it ends up taking a second pass over a piece of text, and refactoring that is on my todo list (though most of the other potential improvements I did benchmark were a wash, so I may find that it doesn't matter much). I'll probably be in more of a hurry to benchmark dxml against other parsing libraries if my dconf talk proposal on it gets accepted, since that's the sort of thing that should probably be in such a talk. I haven't even taken the time yet to figure out which libraries it should be benchmared against. - Jonathan M Davis