Hi there,

As the original author of lxml I might have a few comments.

I think when evaluating an XML library, you want to look at a number of 
things:

* usability of the APIs

* performance

* features

* ease of installation

Here I'll evaluate the options given my impressions:

xml.minidom

API: standard-ish as DOM, but is not very Pythonic
Performance: slow in comparison, gobbles up memory
Features: just a tree, that's it.
Ease of installation: in standard library

xml.sax

API: standard as SAX, but very low-level so relatively hard to work with
Performance: high - I haven't measured the SAX parser in the standard 
library, but it's probably more than adequate for most purposes. Low 
memory usage.
Features: almost nothing, just low-leve SAX events
Ease of installation: in standard library

ElementTree

API: non-standard, but very Pythonic and easy to use
Performance: better than xml.minidom and smaller in memory use
Features: a tree with a 'find' function for easy finding and some other 
helper functions. Also iterparse for SAX-style parsing is available. I 
believe it offers BeautifulSoup integration for messy HTML parsing.
Ease of installation: in standard library (newer Pythons)

cElementTree:

Same as ElementTree, but higher performance. Probably similar to SAX or 
faster, though with more memory usage as tree-based.

lxml
API: same as ElementTree with many extensions for XML features, better 
namespace support, parent node references (which ElementTree lacks), 
objectify for XML-object mapping.
Performance: high, on par with cElementTree, sometimes faster, sometimes 
slower. Offers more features which can help boost performance at some 
points, such as XPath.
Features: tons of features. xpath is *very* useful when reading in XML. 
XSLT, schemas, fast messy HTML parser support, etc.
Ease of installation: harder. C compiler needed (except on Windows). 
libx2ml/libxslt installation needed (except on Windows, it's bundled). 
On Linux usually installation is easy enough as C compiler is available 
and recent enough version on libxml2 is typically installed. On Mac OS 
they install a version that is some years old and too old, so more 
installation effort is needed.

For your purposes, I think (c)ElementTree is a good bet, but if you need 
a lot of features (which can be very convenient to have around), try 
lxml. I'd avoid MiniDOM, though it's probably not a disaster if 
performance is unimportant. If you have huge XML files to parse and you 
need a stream-based parser, you may want to see whether iterparse in 
ElementTree (or lxml) is a more convenient approach than SAX.

http://effbot.org/zone/element-iterparse.htm

Since you all are interested in performance, here are some interesting 
URLs on lxml performance:

http://codespeak.net/lxml/performance.html
http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Regards,

Martijn


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Freevo-devel mailing list
Freevo-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freevo-devel

Reply via email to