Re: xml.dom.minidom question
nivashno, 19.11.2011 22:32: I've got this code: dom = xml.dom.minidom.parse('myfile.xml') for testnode in dom.getElementsByTagName('tests')[0].childNodes: ... print testnode When it's working on this xml: tests testsomething/test /tests I get the following: DOM Text node u'\n ' DOM Element: test at 0xaa6bfac DOM Text node u'\n' But when it's working on this xml: teststestsomething/test/tests I get this: DOM Element: test at 0xbc6f1ac I always thought that xml was very precisely split up into nodes, childnodes, etc, no matter what the whitespace between them was. But apparently not, or am I missing something? You already got some answers to this question. I'd just like to point you to the xml.etree.(c)ElementTree packages, which are substantially faster and easier to use than minidom. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.dom.minidom question
nivashno nivashno@domain.invalid writes: I always thought that xml was very precisely split up into nodes, childnodes, etc, no matter what the whitespace between them was. But apparently not, or am I missing something? The whitespace in your example becomes part of a data element. -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.dom.minidom question
On Sat, 19 Nov 2011 15:32:18 -0600, nivashno wrote: I always thought that xml was very precisely split up into nodes, childnodes, etc, no matter what the whitespace between them was. But apparently not, or am I missing something? XML allows mixed content (an element's children can be a mixture of text and elements). Formats such as XHTML wouldn't be possible otherwise. A validating parser will know from the schema whether an element can contain mixed content, and can use this knowledge to elide whitespace-only text nodes within elements which don't have mixed content (however, that doesn't meant that it will, or even that it should; some applications may prefer to retain the whitespace in order to preserve formatting). A non-validating parser (which doesn't use a schema) doesn't know whether an element contains mixed content, so it has to retain all text nodes in case they're significant. The Python standard library doesn't include a validating XML parser. xmlproc seems to be the preferred validating parser. That has a separate handle_ignorable_data() method for reporting whitespace-only text nodes within non-mixed-content elements; the handle_data() method is only called for significant text. -- http://mail.python.org/mailman/listinfo/python-list