Stefan Behnel <sco...@users.sourceforge.net> added the comment:

Given that the links were generally somewhat dated and used Py2.x instead of 
the post-PEP393 Py3.3, here is another little benchmark, comparing the parser 
performance of minidom to lxml.etree (latest), ElementTree and cElementTree 
(stdlib) in a recent Py3.3 build (e66b7c62eec0), everything properly optimised 
for my platform (Linux 64bit). I used os.fork() to start a new process after 
importing everything and reading the file a couple of times, and before 
parsing. The memory usage is measured inside of the forked child using the 
resource module's ru_maxrss value, so it correlates with the growth of 
CPython's memory heap after parsing, thus giving an estimate of the maximum 
amount of memory used during parsing and tree building.

Parsing hamlet.xml in English, 274KB:

Memory usage: 7284
xml.etree.ElementTree.parse done in 0.104 seconds
Memory usage: 14240 (+6956)
xml.etree.cElementTree.parse done in 0.022 seconds
Memory usage: 9736 (+2452)
lxml.etree.parse done in 0.014 seconds
Memory usage: 11028 (+3744)
minidom tree read in 0.152 seconds
Memory usage: 30360 (+23076)

Parsing the old testament in English (ot.xml, 3.4MB) into memory:

Memory usage: 20444
xml.etree.ElementTree.parse done in 0.385 seconds
Memory usage: 46088 (+25644)
xml.etree.cElementTree.parse done in 0.056 seconds
Memory usage: 32628 (+12184)
lxml.etree.parse done in 0.041 seconds
Memory usage: 37500 (+17056)
minidom tree read in 0.672 seconds
Memory usage: 110428 (+89984)

A 25MB XML file with Slavic Unicode text content:

Memory usage: 57368
xml.etree.ElementTree.parse done in 3.274 seconds
Memory usage: 223720 (+166352)
xml.etree.cElementTree.parse done in 0.459 seconds
Memory usage: 154012 (+96644)
lxml.etree.parse done in 0.454 seconds
Memory usage: 135720 (+78352)
minidom tree read in 6.193 seconds
Memory usage: 604860 (+547492)

And a contrived 4.5MB XML file with lot more structure than data:

Memory usage: 13308
xml.etree.ElementTree.parse done in 4.178 seconds
Memory usage: 222088 (+208780)
xml.etree.cElementTree.parse done in 0.478 seconds
Memory usage: 103056 (+89748)
lxml.etree.parse done in 0.199 seconds
Memory usage: 101860 (+88552)
minidom tree read in 8.705 seconds
Memory usage: 810964 (+797656)

Things to note: The factor of 5-10 for the memory overhead compared to cET 
depends heavily on the data. Also, minidom is consistently slower by more than 
a factor of 10 compared to the fastest parser (apparently the one in 
libxml2/lxml.etree, both of which surely can't be said to provide less features 
than the DOM that minidom implements).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11379>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to