Re: xml.dom.minidom question

2011-11-20 Thread Stefan Behnel

nivashno, 19.11.2011 22:32:

I've got this code:

 dom = xml.dom.minidom.parse('myfile.xml')
 for testnode in dom.getElementsByTagName('tests')[0].childNodes:
... print testnode

When it's working on this xml:

 tests
   testsomething/test
 /tests

I get the following:

DOM Text node u'\n  '
DOM Element: test at 0xaa6bfac
DOM Text node u'\n'

But when it's working on this xml:

 teststestsomething/test/tests

I get this:

DOM Element: test at 0xbc6f1ac


I always thought that xml was very precisely split up into nodes,
childnodes, etc, no matter what the whitespace between them was. But
apparently not, or am I missing something?


You already got some answers to this question. I'd just like to point you 
to the xml.etree.(c)ElementTree packages, which are substantially faster 
and easier to use than minidom.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: xml.dom.minidom question

2011-11-19 Thread Paul Rubin
nivashno nivashno@domain.invalid writes:
 I always thought that xml was very precisely split up into nodes, 
 childnodes, etc, no matter what the whitespace between them was. But 
 apparently not, or am I missing something?

The whitespace in your example becomes part of a data element.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.dom.minidom question

2011-11-19 Thread Nobody
On Sat, 19 Nov 2011 15:32:18 -0600, nivashno wrote:

 I always thought that xml was very precisely split up into nodes, 
 childnodes, etc, no matter what the whitespace between them was. But 
 apparently not, or am I missing something?

XML allows mixed content (an element's children can be a mixture of text
and elements). Formats such as XHTML wouldn't be possible otherwise.

A validating parser will know from the schema whether an element can
contain mixed content, and can use this knowledge to elide whitespace-only
text nodes within elements which don't have mixed content (however, that
doesn't meant that it will, or even that it should; some applications may
prefer to retain the whitespace in order to preserve formatting).

A non-validating parser (which doesn't use a schema) doesn't know whether
an element contains mixed content, so it has to retain all text nodes in
case they're significant.

The Python standard library doesn't include a validating XML parser.
xmlproc seems to be the preferred validating parser. That has a separate
handle_ignorable_data() method for reporting whitespace-only text nodes
within non-mixed-content elements; the handle_data() method is only called
for significant text.

-- 
http://mail.python.org/mailman/listinfo/python-list