Stefan Behnel <sco...@users.sourceforge.net> added the comment: Given that the links were generally somewhat dated and used Py2.x instead of the post-PEP393 Py3.3, here is another little benchmark, comparing the parser performance of minidom to lxml.etree (latest), ElementTree and cElementTree (stdlib) in a recent Py3.3 build (e66b7c62eec0), everything properly optimised for my platform (Linux 64bit). I used os.fork() to start a new process after importing everything and reading the file a couple of times, and before parsing. The memory usage is measured inside of the forked child using the resource module's ru_maxrss value, so it correlates with the growth of CPython's memory heap after parsing, thus giving an estimate of the maximum amount of memory used during parsing and tree building.
Parsing hamlet.xml in English, 274KB: Memory usage: 7284 xml.etree.ElementTree.parse done in 0.104 seconds Memory usage: 14240 (+6956) xml.etree.cElementTree.parse done in 0.022 seconds Memory usage: 9736 (+2452) lxml.etree.parse done in 0.014 seconds Memory usage: 11028 (+3744) minidom tree read in 0.152 seconds Memory usage: 30360 (+23076) Parsing the old testament in English (ot.xml, 3.4MB) into memory: Memory usage: 20444 xml.etree.ElementTree.parse done in 0.385 seconds Memory usage: 46088 (+25644) xml.etree.cElementTree.parse done in 0.056 seconds Memory usage: 32628 (+12184) lxml.etree.parse done in 0.041 seconds Memory usage: 37500 (+17056) minidom tree read in 0.672 seconds Memory usage: 110428 (+89984) A 25MB XML file with Slavic Unicode text content: Memory usage: 57368 xml.etree.ElementTree.parse done in 3.274 seconds Memory usage: 223720 (+166352) xml.etree.cElementTree.parse done in 0.459 seconds Memory usage: 154012 (+96644) lxml.etree.parse done in 0.454 seconds Memory usage: 135720 (+78352) minidom tree read in 6.193 seconds Memory usage: 604860 (+547492) And a contrived 4.5MB XML file with lot more structure than data: Memory usage: 13308 xml.etree.ElementTree.parse done in 4.178 seconds Memory usage: 222088 (+208780) xml.etree.cElementTree.parse done in 0.478 seconds Memory usage: 103056 (+89748) lxml.etree.parse done in 0.199 seconds Memory usage: 101860 (+88552) minidom tree read in 8.705 seconds Memory usage: 810964 (+797656) Things to note: The factor of 5-10 for the memory overhead compared to cET depends heavily on the data. Also, minidom is consistently slower by more than a factor of 10 compared to the fastest parser (apparently the one in libxml2/lxml.etree, both of which surely can't be said to provide less features than the DOM that minidom implements). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11379> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com