There are various packages availaible for XML processing using python. So which to choose and when. I summarized some of the features, advantages and disadvantages of some packages int the following text. Have a look to it. May this get out of the dillema of choice.
Here we go: OPTIONS ========= - libxml2 - lxml - Pyxml - 4Suite DESCRIPTION ============= ------- libxml2 ------- A quote by Mark Pilgrim: "Programming with libxml2 is like the thrilling embrace of an exotic stranger. It seems to have the potential to fulfill your wildest dreams, but there's a nagging voice somewhere in your head warning you that you're about to get screwed in the worst way." Features: ========= - Namespaces in XML - XPath, Xpointer, XInclude XML Base - XML Schemas Part 2 : DataTypes - Relax NG - SAX: a SAX2 like interface and a minimal SAX1 implementation compatible with early expat versions - NO DOM: It provide support for DOM to some extent BUT it does not implement the API itself, gdome2 . - It is written in plain C, making as few assumptions as possible, and sticking closely to ANSI C/POSIX for easy embedding. - Platform: Linux/Unix/Windows Advantages ========== - Standards-compliant XML support. - Full-featured. - Actively maintained by XML experts. - fast. fast! FAST! - Stable. Disadvantages ============= This library already ship with Python bindings, but these Python bindings have some problems: - Very low level and C-ish (not Pythonic). - Underdocumented and huge, you get lost in them. - UTF-8 in API, instead of Python unicode strings. - Can cause segfaults from Python. - Have to do manual memory management. As the library calls are more or less an exact mapping on the C API, and thus require to think about memory management For Those who want ot go for DOM API: Packages for DOM ================ - gdome2: gdome2 provides support for dom on top of libxml2.C-Based (http://gdome2.cs.unibo.it/) - libxml2dom: Other option availabile is libxml2dom. (http://cheeseshop.python.org/pypi/libxml2dom/0.3.3) - libxml_domlib:libxml_domlib is a Python extension module that enables you to use the DOM interface to libxml2 (http://www.rexx.com/~dkuhlman/libxml_domlib.html) Resources ========== - http://xmlsoft.org/index.html - http://codespeak.net/lxml/intro.html ---- lxml ----- lxml follows the ElementTree API as much as possible, building it on top of the native libxml2 tree. Features ======== - lxml provides all above features as of libxml2 but using ElementTreet API. Advantages ========== - Pythonic API. - Documented. - Use Python unicode strings in API. - Safe (no segfaults). - No manual memory management Disadvantages ============== - No DOM support as in libxml2. - It is in its initial release (latest is lxml 0.7) Resources ========= - http://codespeak.net/lxml/ ------ Pyxml ------ Features ========= - xmlproc: a validating XML parser. - Expat: a fast non-validating parser. - sgmlop: a C helper module that can speed-up xmllib.py and sgmllib.py by a factor of 5. - PySAX: SAX 1 and SAX2 libraries with drivers for most of the parsers. - 4DOM: A fully compliant DOM Level 2 implementation - pulldom: a DOM implementation that supports lazy instantiation of nodes. - marshal: a module with several options for serializing Python objects to XML Advantages ========== - A lot of documentation is availaible and almost all resources and examples based on it. Disadvantages ============= - No Schema support Pacakges for Schema(For those who want schema support too) =================== XSV: currently in progress, and provides XML schema Part 1: Structures. Dependent on some other pacakage PyLTXML (http://www.ltg.ed.ac.uk/~ht/xsv-status.html) ------- 4Suite ------- Features: ========= - XML,XSLT,XPath,DOM,XInclude,XPointer,XLink,XUpdate,RELAX NG,XML Catalogs - Platform: Posix, Windows Advantages ============ - As, this provides Relax NG: RELAX NG, a simple schema language for XML, based on [RELAX] and [TREX]. A RELAX NG schema specifies a pattern for the structure and content of an XML document. [1] http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDAGDYR [2] http://xmlbuddy.com/2.0/features.html [3] http://www.xml.com/pub/a/2001/12/12/schemacompare.html?page=2 * But Relax NG is not W3C based. It is provided by OASIS. Site: ====== [4] http://cheeseshop.python.org/pypi/4Suite-XML/1.0b3 -- http://mail.python.org/mailman/listinfo/python-list