Out of interest, i just tried loading the same file with std.xml, and
the performance there is pretty similar in each version, possibly
slightly slower in 2.058 (~21-22 seconds in each case).
Disabling the GC during the load gets 9 seconds, though task manager
reports a peak memory usage of almost 600 megabytes in that case!
It looks like most of the time here is spent in Gcxmark whereas with
xmlp it was in Gcxfullcollect (and fullcollect is the one that is faster
in 2.058).
The profiler makes it look like things are spending more time in Gcxmark
than they were before. Is that the case?
I'll try to have a go with Tango when i get some more time.