On Sat, 19 Nov 2011 15:32:18 -0600, nivashno wrote: > I always thought that xml was very precisely split up into nodes, > childnodes, etc, no matter what the whitespace between them was. But > apparently not, or am I missing something?
XML allows mixed content (an element's children can be a mixture of text and elements). Formats such as XHTML wouldn't be possible otherwise. A validating parser will know from the schema whether an element can contain mixed content, and can use this knowledge to elide whitespace-only text nodes within elements which don't have mixed content (however, that doesn't meant that it will, or even that it should; some applications may prefer to retain the whitespace in order to preserve formatting). A non-validating parser (which doesn't use a schema) doesn't know whether an element contains mixed content, so it has to retain all text nodes in case they're significant. The Python standard library doesn't include a validating XML parser. xmlproc seems to be the preferred validating parser. That has a separate handle_ignorable_data() method for reporting whitespace-only text nodes within non-mixed-content elements; the handle_data() method is only called for "significant" text. -- http://mail.python.org/mailman/listinfo/python-list