There are various packages availaible for XML processing using python.
So which to choose and when. I summarized some of the features,
advantages and disadvantages of some packages int the following text.
Have a look to it. May this get out of the dillema of choice.

Here we go:

OPTIONS
=========
- libxml2
- lxml
- Pyxml
- 4Suite



DESCRIPTION
=============


-------
libxml2
-------
A quote by Mark Pilgrim: "Programming with libxml2 is like the
thrilling embrace of an exotic stranger. It seems to have the potential
to fulfill your wildest dreams, but there's a nagging voice somewhere
in  your head warning you that you're about to get screwed in the worst
way."

        Features:
        =========
          - Namespaces in XML
          - XPath, Xpointer, XInclude XML Base
          - XML Schemas Part 2 : DataTypes
          - Relax NG
          - SAX: a SAX2 like interface and a minimal SAX1 implementation
compatible
                    with early expat versions
          - NO DOM:  It provide support for DOM to some extent BUT it does not

                    implement the API itself, gdome2 .
          - It is written in plain C, making as few assumptions as possible,
and sticking
                     closely to   ANSI C/POSIX for easy embedding.
          - Platform: Linux/Unix/Windows


        Advantages
        ==========
          - Standards-compliant XML support.
          - Full-featured.
                 - Actively maintained by XML experts.
                 - fast. fast! FAST!
         - Stable.

        Disadvantages
        =============
                This library already ship with Python bindings, but
these Python bindings have
                some        problems:
           - Very low level and C-ish (not Pythonic).
           - Underdocumented and huge, you get lost in them.
                   - UTF-8 in API, instead of Python unicode strings.
                    - Can cause segfaults from Python.
                    - Have to do manual memory management. As the
library calls are more or
                     less an exact mapping on the C API, and thus
require to think about
                    memory management

               For Those who want ot go for DOM API:
        Packages for DOM
        ================
            - gdome2: gdome2 provides support for dom on top of
libxml2.C-Based
                     (http://gdome2.cs.unibo.it/)
            - libxml2dom: Other option availabile is libxml2dom.

(http://cheeseshop.python.org/pypi/libxml2dom/0.3.3)
            - libxml_domlib:libxml_domlib is a Python extension module that
enables you
                       to use the DOM  interface to libxml2

(http://www.rexx.com/~dkuhlman/libxml_domlib.html)


        Resources
        ==========
          - http://xmlsoft.org/index.html
          - http://codespeak.net/lxml/intro.html


----
lxml
-----
        lxml follows the ElementTree API as much as possible, building it on
top of the native libxml2 tree.

        Features
        ========
         - lxml provides all above features as of libxml2 but using
ElementTreet API.

        Advantages
        ==========
         - Pythonic API.
                 - Documented.
                 - Use Python unicode strings in API.
                 - Safe (no segfaults).
                 - No manual memory management


        Disadvantages
        ==============
        - No DOM support as in libxml2.
        - It is in its initial release (latest is lxml 0.7)


        Resources
        =========
        - http://codespeak.net/lxml/


------
Pyxml
------
        Features
        =========
         - xmlproc: a validating XML parser.
         - Expat: a fast non-validating parser.
         - sgmlop: a C helper module that can speed-up xmllib.py and
sgmllib.py by a
                  factor of 5.
         - PySAX: SAX 1 and SAX2 libraries with drivers for most of the
parsers.
         - 4DOM: A fully compliant DOM Level 2 implementation
         - pulldom: a DOM implementation that supports lazy instantiation of
nodes.
         - marshal: a module with several options for serializing Python
objects to XML


        Advantages
        ==========
         - A lot of documentation is availaible and almost all resources and
examples
                   based on it.

        Disadvantages
        =============
        - No Schema support

        Pacakges for Schema(For those who want schema support too)
        ===================
        XSV: currently in progress, and provides XML schema Part 1:
Structures.
        Dependent on some other pacakage PyLTXML
                (http://www.ltg.ed.ac.uk/~ht/xsv-status.html)




-------
4Suite
-------
          Features:
          =========
          - XML,XSLT,XPath,DOM,XInclude,XPointer,XLink,XUpdate,RELAX NG,XML
                   Catalogs
          - Platform: Posix, Windows

          Advantages
         ============
          - As, this provides Relax NG: RELAX NG, a simple schema language for
XML,
                    based on [RELAX] and [TREX]. A RELAX NG schema
specifies a pattern for
                   the structure and content of an XML document.
        [1]
http://www.oasis-open.org/committees/relax-ng/spec-20011203.html#IDAGDYR
        [2] http://xmlbuddy.com/2.0/features.html
        [3] http://www.xml.com/pub/a/2001/12/12/schemacompare.html?page=2

              * But Relax NG is not W3C based. It is provided by OASIS.


                 Site:
         ======
          [4] http://cheeseshop.python.org/pypi/4Suite-XML/1.0b3

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to