Re: ElementTree Namespace Prefixes
Chris Spencer: Fredrik Lundh wrote: Chris Spencer wrote: If an XML parser reads in and then writes out a document without having altered it, then the new document should be the same as the original. says who? Good question. There is no One True Answer even within the XML standards. It all boils down to how you define the same. Which parts of the XML document are meaningful content that needs to be preserved and which ones are mere encoding variations that may be omitted from the internal representation? One can point out the XML namespaces spec all one wants, but it doesn't matter. The fact is that regardless of what that spec says, as you say, Chris, there are too many XML technologies that require prefix retention.As a simple example, XPath and XSLT, W3C specs just like XMLNS, uses qnames in context, which requires prefix retention. Besides all that, prefix retention is generally more user friendly in round-trip or poartial round-trip scenarios. That's why cDomlette, part of 4Suite [1] and Amara [2], a more Pythonic API for this, both support prefix retention by default. [1] http://4suite.org [2] http://uche.ogbuji.net/tech/4Suite/amara/ -- Uche http://copia.ogbuji.net -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Jarek Zgoda wrote: [snip] It's a shame the default ns behavior in Elementtree is in such a poort staten. I'm surprised no one's forked Elementtree solely to fix this issue. There is at least one ElementTree API implementation that retains prefixes, lxml.ETree. Go google for it. Just to make it explicitly clear, lxml is not a fork of ElementTree fork, but a reimplementation of the API on top of libxml2. ElementTree indeed retains prefixes, and since version 0.7 released earlier this way, it's also possible to get some control over generation of prefixes during element construction. You can find it here: http://codespeak.net/lxml Regards, Martijn -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
you forgot http://effbot.org/zone/element-infoset.htm which describes the 3-node XML infoset subset used by ElementTree. No, I did not forget your infoset subset. I was comparing it with other infoset subsets described in various XML specifications. I agree 100% that prefixes were not *supposed* to be part of the document's meaning back when the XML namespace specification was written, but later specifications broke that. Please take a look at http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting ... there now exist a number of contexts in which namespace prefixes can impart information value in an XML document... ...Moreover, it is possible to prove that namespace rewriting is harmful, rather than simply ineffective. -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Chris Spencer wrote: If an XML parser reads in and then writes out a document without having altered it, then the new document should be the same as the original. says who? With Elementtree this isn't so. Lundh apparently believes he knows better than you and I on how our namespaces should be represented. do you even understand how XML namespaces work? /F -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Fredrik Lundh wrote: Chris Spencer wrote: If an XML parser reads in and then writes out a document without having altered it, then the new document should be the same as the original. says who? Good question. There is no One True Answer even within the XML standards. It all boils down to how you define the same. Which parts of the XML document are meaningful content that needs to be preserved and which ones are mere encoding variations that may be omitted from the internal representation? Some relevant references which may be used as guidelines: * http://www.w3.org/TR/xml-infoset The XML infoset defines 11 types of information items including document type declaration, notations and other features. It does not appear to be suitable for a lightweight API like ElementTree. * http://www.w3.org/TR/xpath-datamodel The XPath data model uses a subset of the XML infoset with only seven node types. http://www.w3.org/TR/xml-c14n The canonical XML recommendation is meant to describe a process but it also effectively defines a data model: anything preserved by the canonicalization process is part of the model. Anything not preserved is not part of the model. In theory, this definition should be equivalent to the xpath data model since canonical XML is defined in terms of the xpath data model. In practice, the XPath data model defines properties not required for producing canonical XML (e.g. unparsed entities associated with document note). I like this alternative black box definition because provides a simple touchstone for determining what is or isn't part of the model. I think it would be a good goal for ElementTree to aim for compliance with the canonical XML data model. It's already quite close. It's possible to use the canonical XML data model without being a canonical XML processor but it would be nice if parse() followed by write() actually passed the canonical XML test vectors. It's the easiest way to demonstrate compliance conclusively. So what changes are required to make ElementTree canonical? 1. PI nodes are already supported for output. Need an option to preserve them on parsing 2. Comment nodes are already support for output. Need an option to preserve them on parsing (canonical XML also defines a no comments canonical form) 3. Preserve Comments and PIs outside the root element (store them as children of the ElementTree object?) 4. Sorting of attributes by canonical order 5. Minor formatting and spacing issues in opening tags oh, and one more thing... 6. preserve namespace prefixes ;-) (see http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting for rationale) -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Oren Tirosh wrote: It all boils down to how you define the same. Which parts of the XML document are meaningful content that needs to be preserved and which ones are mere encoding variations that may be omitted from the internal representation? Some relevant references which may be used as guidelines: * http://www.w3.org/TR/xml-infoset The XML infoset defines 11 types of information items including document type declaration, notations and other features. It does not appear to be suitable for a lightweight API like ElementTree. * http://www.w3.org/TR/xpath-datamodel The XPath data model uses a subset of the XML infoset with only seven node types. http://www.w3.org/TR/xml-c14n The canonical XML recommendation is meant to describe a process but it also effectively defines a data model: anything preserved by the canonicalization process is part of the model. Anything not preserved is not part of the model. you forgot http://effbot.org/zone/element-infoset.htm which describes the 3-node XML infoset subset used by ElementTree. /F -- http://mail.python.org/mailman/listinfo/python-list
ElementTree Namespace Prefixes
Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? The default behavior is to strip a document of all prefixes and then replace them autogenerated prefixes like ns0, ns1, etc. The correct behavior should be to write the file in the form that it was read, which it seems to do correctly for everything except namespace prefixes. The docs mention proper output can be achieved by using the Qname object, but they don't go into any detail. Any help is appreciated. Thanks, Chris Spencer -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
On Sun, 12 Jun 2005 15:06:18 +, Chris Spencer wrote: Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? See the recent c.l.python thread titled ElemenTree and namespaces and started May 16 2:03pm. One archive is at http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/31b2e9f4a8f7338c/363f46513fb8de04?rnum=3hl=en Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Andrew Dalke wrote: On Sun, 12 Jun 2005 15:06:18 +, Chris Spencer wrote: Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? See the recent c.l.python thread titled ElemenTree and namespaces and started May 16 2:03pm. One archive is at http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/31b2e9f4a8f7338c/363f46513fb8de04?rnum=3hl=en Thanks, although that thread didn't seem to resolve the issue. All the first few links talk about is how to hack your own parser to make sense of the Clark notation. The problem at hand is with how Elementtree outputs namespaces and represents the tag name in memory. Given xml with no namespaces, Elementtree works perfectly. However, if you give the root tag an xmlns attribute, Elementtree relabels all child nodes with it's own prefix, completely defeating the purpose of the default namespace. In my opinion, this is unacceptable behavior. If an XML parser reads in and then writes out a document without having altered it, then the new document should be the same as the original. With Elementtree this isn't so. Lundh apparently believes he knows better than you and I on how our namespaces should be represented. It's a shame the default ns behavior in Elementtree is in such a poort staten. I'm surprised no one's forked Elementtree solely to fix this issue. Anyways, Python's native minidom works as expected, so I'll probably use that instead, even if the api is slightly less intuitive. Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree Namespace Prefixes
Chris Spencer napisa(a): Given xml with no namespaces, Elementtree works perfectly. However, if you give the root tag an xmlns attribute, Elementtree relabels all child nodes with it's own prefix, completely defeating the purpose of the default namespace. In my opinion, this is unacceptable behavior. There is no functional difference between default namespace and normal namespace. Replacing default with normal has no effect for document processing (namespace doesn't change, only prefix), although it looks differently for humans. Anyway, XML is for machines, not for humans. If an XML parser reads in and then writes out a document without having altered it, then the new document should be the same as the original. With Elementtree this isn't so. Lundh apparently believes he knows better than you and I on how our namespaces should be represented. No, this is perfectly valid behaviour. Go, see spec. It's a shame the default ns behavior in Elementtree is in such a poort staten. I'm surprised no one's forked Elementtree solely to fix this issue. There is at least one ElementTree API implementation that retains prefixes, lxml.ETree. Go google for it. -- Jarek Zgoda http://jpa.berlios.de/ | http://www.zgodowie.org/ -- http://mail.python.org/mailman/listinfo/python-list