Hi there, As the original author of lxml I might have a few comments.
I think when evaluating an XML library, you want to look at a number of things: * usability of the APIs * performance * features * ease of installation Here I'll evaluate the options given my impressions: xml.minidom API: standard-ish as DOM, but is not very Pythonic Performance: slow in comparison, gobbles up memory Features: just a tree, that's it. Ease of installation: in standard library xml.sax API: standard as SAX, but very low-level so relatively hard to work with Performance: high - I haven't measured the SAX parser in the standard library, but it's probably more than adequate for most purposes. Low memory usage. Features: almost nothing, just low-leve SAX events Ease of installation: in standard library ElementTree API: non-standard, but very Pythonic and easy to use Performance: better than xml.minidom and smaller in memory use Features: a tree with a 'find' function for easy finding and some other helper functions. Also iterparse for SAX-style parsing is available. I believe it offers BeautifulSoup integration for messy HTML parsing. Ease of installation: in standard library (newer Pythons) cElementTree: Same as ElementTree, but higher performance. Probably similar to SAX or faster, though with more memory usage as tree-based. lxml API: same as ElementTree with many extensions for XML features, better namespace support, parent node references (which ElementTree lacks), objectify for XML-object mapping. Performance: high, on par with cElementTree, sometimes faster, sometimes slower. Offers more features which can help boost performance at some points, such as XPath. Features: tons of features. xpath is *very* useful when reading in XML. XSLT, schemas, fast messy HTML parser support, etc. Ease of installation: harder. C compiler needed (except on Windows). libx2ml/libxslt installation needed (except on Windows, it's bundled). On Linux usually installation is easy enough as C compiler is available and recent enough version on libxml2 is typically installed. On Mac OS they install a version that is some years old and too old, so more installation effort is needed. For your purposes, I think (c)ElementTree is a good bet, but if you need a lot of features (which can be very convenient to have around), try lxml. I'd avoid MiniDOM, though it's probably not a disaster if performance is unimportant. If you have huge XML files to parse and you need a stream-based parser, you may want to see whether iterparse in ElementTree (or lxml) is a more convenient approach than SAX. http://effbot.org/zone/element-iterparse.htm Since you all are interested in performance, here are some interesting URLs on lxml performance: http://codespeak.net/lxml/performance.html http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Regards, Martijn ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ Freevo-devel mailing list Freevo-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freevo-devel